You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/11/03 08:34:58 UTC
[jira] [Created] (SPARK-18243) Converge the insert path of Hive
tables with data source tables
Reynold Xin created SPARK-18243:
-----------------------------------
Summary: Converge the insert path of Hive tables with data source tables
Key: SPARK-18243
URL: https://issues.apache.org/jira/browse/SPARK-18243
Project: Spark
Issue Type: Sub-task
Components: SQL
Reporter: Reynold Xin
Inserting data into Hive tables has its own implementation that is distinct from data sources: InsertIntoHiveTable, SparkHiveWriterContainer and SparkHiveDynamicPartitionWriterContainer.
I think it should be possible to unify these with data source implementations InsertIntoHadoopFsRelationCommand. We can start by implementing an OutputWriterFactory/OutputWriter that uses Hive's serdes to write data.
Note that one other major difference is that data source tables write directly to the final destination without using some staging directory, and then Spark itself adds the partitions/tables to the catalog. Hive tables actually write to some staging directory, and then call Hive metastore's loadPartition/loadTable function to load those data in.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org