You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/11/03 08:34:58 UTC

[jira] [Created] (SPARK-18243) Converge the insert path of Hive tables with data source tables

Reynold Xin created SPARK-18243:
-----------------------------------

             Summary: Converge the insert path of Hive tables with data source tables
                 Key: SPARK-18243
                 URL: https://issues.apache.org/jira/browse/SPARK-18243
             Project: Spark
          Issue Type: Sub-task
          Components: SQL
            Reporter: Reynold Xin


Inserting data into Hive tables has its own implementation that is distinct from data sources: InsertIntoHiveTable, SparkHiveWriterContainer and SparkHiveDynamicPartitionWriterContainer.

I think it should be possible to unify these with data source implementations InsertIntoHadoopFsRelationCommand. We can start by implementing an OutputWriterFactory/OutputWriter that uses Hive's serdes to write data.

Note that one other major difference is that data source tables write directly to the final destination without using some staging directory, and then Spark itself adds the partitions/tables to the catalog. Hive tables actually write to some staging directory, and then call Hive metastore's loadPartition/loadTable function to load those data in.

 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org