You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Mark Hamstra (JIRA)" <ji...@apache.org> on 2017/08/03 22:02:00 UTC
[jira] [Comment Edited] (SPARK-21619) Fail the execution of canonicalized plans explicitly

    [ https://issues.apache.org/jira/browse/SPARK-21619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113555#comment-16113555 ] 

Mark Hamstra edited comment on SPARK-21619 at 8/3/17 10:01 PM:
---------------------------------------------------------------

Yes, I absolutely understand that this issue and PR are meant to address an immediate need, and that a deeper redesign would be one or likely more separate issues. I'm more trying to raise awareness or improve my understanding than to delay or block progress on addressing the immediate need.

I do have concerns, though, that making canonical plans unexecutable just because they are in canonical form does make certain evolutions of Spark more difficult. As one half-baked example, you could want to decouple query plans from a single execution engine, so that certain kinds of logical plans could be sent toward execution on one engine (or cluster configuration) while other plans could be directed to a separate engine (presumably more suitable to those plans in some way.) Splitting and forking Spark's query execution pipeline in that kind of way isn't really that difficult (I've done it in at least a proof-of-concept), and has some perhaps significant potential benefits. To do that, though, you'd really like to have a single, canonical form for any semantically equivalent queries by the time they reach your dispatch function for determining the destination execution engine for a query (and where results will be cached locally, etc.) Making the canonical form unexecutable throws a wrench into that.  


was (Author: markhamstra):
Yes, I absolutely understand that this issue and PR are meant to address an immediate need, and that a deeper redesign would be one or likely more separate issues. I more trying to raise awareness or improve my understanding than to delay or block progress on addressing the immediate need.

I do have concerns, though, that making canonical plans unexecutable just because they are in canonical form does make certain evolutions of Spark more difficult. As one half-baked example, you could want to decouple query plans from a single execution engine, so that certain kinds of logical plans could be sent toward execution on one engine (or cluster configuration) while other plans could be directed to a separate engine (presumably more suitable to those plans in some way.) Splitting and forking Spark's query execution pipeline in that kind of way isn't really that difficult (I've done it in at least a proof-of-concept), and has some perhaps significant potential benefits. To do that, though, you'd really like to have a single, canonical form for any semantically equivalent queries by the time they reach your dispatch function for determining the destination execution engine for a query (and where results will be cached locally, etc.) Making the canonical form unexecutable throws a wrench into that.  

> Fail the execution of canonicalized plans explicitly
> ----------------------------------------------------
>
>                 Key: SPARK-21619
>                 URL: https://issues.apache.org/jira/browse/SPARK-21619
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Reynold Xin
>            Assignee: Reynold Xin
>
> Canonicalized plans are not supposed to be executed. I ran into a case in which there's some code that accidentally calls execute on a canonicalized plan. This patch throws a more explicit exception when that happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org