You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Armbrust (JIRA)" <ji...@apache.org> on 2016/11/10 21:32:58 UTC
[jira] [Created] (SPARK-18407) Inferred partition columns cause
assertion error
Michael Armbrust created SPARK-18407:
----------------------------------------
Summary: Inferred partition columns cause assertion error
Key: SPARK-18407
URL: https://issues.apache.org/jira/browse/SPARK-18407
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 2.0.2
Reporter: Michael Armbrust
Priority: Critical
[This assertion|https://github.com/apache/spark/blob/16eaad9daed0b633e6a714b5704509aa7107d6e5/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L408] fails when you run a stream against json data that is stored in partitioned folders, if you manually specify the schema and that schema omits the partitioned columns.
My hunch is that we are inferring those columns even though the schema is being passed in manually and adding them to the end.
While we are fixing this bug, it would be nice to make the assertion better. Truncating is not terribly useful as, at least in my case, it truncated the most interesting part. I changed it to this while debugging:
{code}
s"""
|Batch does not have expected schema
|Expected: ${output.mkString(",")}
|Actual: ${newPlan.output.mkString(",")}
|
|== Original ==
|$logicalPlan
|
|== Batch ==
|$newPlan
""".stripMargin
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org