You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2016/06/02 07:30:59 UTC

[jira] [Comment Edited] (SPARK-15691) Refactor and improve Hive support

    [ https://issues.apache.org/jira/browse/SPARK-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15311878#comment-15311878 ] 

Xiao Li edited comment on SPARK-15691 at 6/2/16 7:30 AM:
---------------------------------------------------------

IMO, this is the first piece of component we need to refactor, but this is a very interesting part. Many concepts are mixed in the same class: {{SparkSession}}, {{SessionState}}, {{DataSource}}, {{parser}}, Hive-specific {{analyzer rules}}, {{cache}}, {{MetastoreRelation}}, {{MetaStorePartitionedTableFileCatalog}} ... Still trying to split it in a clean way.


was (Author: smilegator):
IMO, this is the first piece of component we need to refactor, but this is a very interesting part. Many concepts are mixed in the same class: {SparkSession}, {SessionState}, {DataSource}, {parser}, Hive-specific {analyzer rules}, {cache}, {MetastoreRelation}, {MetaStorePartitionedTableFileCatalog} ... Still trying to split it in a clean way.

> Refactor and improve Hive support
> ---------------------------------
>
>                 Key: SPARK-15691
>                 URL: https://issues.apache.org/jira/browse/SPARK-15691
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>            Reporter: Reynold Xin
>
> Hive support is important to Spark SQL, as many Spark users use it to read from Hive. The current architecture is very difficult to maintain, and this ticket tracks progress towards getting us to a sane state.
> A number of things we want to accomplish are:
> - Move the Hive specific catalog logic into HiveExternalCatalog.
>   -- Remove HiveSessionCatalog. All Hive-related stuff should go into HiveExternalCatalog. This would require moving caching either into HiveExternalCatalog, or just into SessionCatalog.
>   -- Move using properties to store data source options into HiveExternalCatalog.
>   -- Potentially more.
> - Remove HIve's specific ScriptTransform implementation and make it more general so we can put it in sql/core.
> - Implement HiveTableScan (and write path) as a data source, so we don't need a special planner rule for HiveTableScan.
> - Remove HiveSharedState and HiveSessionState.
> One thing that is still unclear to me is how to work with Hive UDF support. We might still need a special planner rule there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org