You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2015/08/17 15:21:45 UTC

[jira] [Commented] (SPARK-9999) RDD-like API on top of Catalyst/DataFrame

    [ https://issues.apache.org/jira/browse/SPARK-9999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14699512#comment-14699512 ] 

Herman van Hovell commented on SPARK-9999:
------------------------------------------

This sounds interesting.

In order to get this working, we need to get more information on the (black-box) operators used. So some analysis capability, or some predefined building blocks (SQL-lite if you will) are probably needed. Apache Flink uses static code analysis and annotations for to achieve a similar goal:
http://flink.apache.org/news/2015/06/24/announcing-apache-flink-0.9.0-release.html
https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/programming_guide.html#semantic-annotations

Any other ideas?



> RDD-like API on top of Catalyst/DataFrame
> -----------------------------------------
>
>                 Key: SPARK-9999
>                 URL: https://issues.apache.org/jira/browse/SPARK-9999
>             Project: Spark
>          Issue Type: Story
>          Components: SQL
>            Reporter: Reynold Xin
>
> The RDD API is very flexible, and as a result harder to optimize its execution in some cases. The DataFrame API, on the other hand, is much easier to optimize, but lacks some of the nice perks of the RDD API (e.g. harder to use UDFs, lack of strong types in Scala/Java).
> As a Spark user, I want an API that sits somewhere in the middle of the spectrum so I can write most of my applications with that API, and yet it can be optimized well by Spark to achieve performance and stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org