You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Burak Yavuz (JIRA)" <ji...@apache.org> on 2017/06/26 18:58:00 UTC
[jira] [Created] (SPARK-21216) Streaming DataFrames fail to join
with Hive tables
Burak Yavuz created SPARK-21216:
-----------------------------------
Summary: Streaming DataFrames fail to join with Hive tables
Key: SPARK-21216
URL: https://issues.apache.org/jira/browse/SPARK-21216
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 2.1.1
Reporter: Burak Yavuz
Assignee: Burak Yavuz
The following code will throw a cryptic exception:
{code}
import org.apache.spark.sql.execution.streaming.MemoryStream
import testImplicits._
implicit val _sqlContext = spark.sqlContext
Seq((1, "one"), (2, "two"), (4, "four")).toDF("number", "word").createOrReplaceTempView("t1")
// Make a table and ensure it will be broadcast.
sql("""CREATE TABLE smallTable(word string, number int)
|ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
|STORED AS TEXTFILE
""".stripMargin)
sql(
"""INSERT INTO smallTable
|SELECT word, number from t1
""".stripMargin)
val inputData = MemoryStream[Int]
val joined = inputData.toDS().toDF()
.join(spark.table("smallTable"), $"value" === $"number")
val sq = joined.writeStream
.format("memory")
.queryName("t2")
.start()
try {
inputData.addData(1, 2)
sq.processAllAvailable()
} finally {
sq.stop()
}
{code}
If someone creates a HiveSession, the planner in `IncrementalExecution` doesn't take into account the Hive scan strategies
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org