You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by nimrodo <ni...@veracity-group.com> on 2017/02/26 13:46:48 UTC

Saving Structured Streaming DF to Hive Partitioned table

Hi,

I want to load a stream of CSV files to a partitioned Hive table called
myTable.

I tried using Spark 2 Structured Streaming to do that:
val spark = SparkSession
      .builder
      .appName("TrueCallLoade")
      .enableHiveSupport()
      .config("hive.exec.dynamic.partition.mode", "non-strict")
      .config("hive.exec.dynamic.partition", "true")
      .config("hive.exec.max.dynamic.partition", "2048")
      .config("hive.exec.max.dynamic.partition.pernode", "256")
      .getOrCreate()
val df = spark.readStream.option("sep", ",").option("header",
"true").schema(customSchema).csv(fileDirectory)

The dataframe has 2 columns called "dt" and "h" by which the Hive table is
partitioned.

writeStream can't directly stream to a Hive table, so I decided to use
val query =
df.writeStream.queryName("LoadedCSVData").outputMode("Append").format("memory").start()

and then
spark.sql("INSERT INTO myTable SELECT * FROM LoadedCSVData")

This doesn't seem to insert work. Any idea how I can achieve that?

Nimrod



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Structured-Streaming-DF-to-Hive-Partitioned-table-tp28424.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org