You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Andrew Ash (JIRA)" <ji...@apache.org> on 2017/09/07 07:39:00 UTC

[jira] [Commented] (SPARK-12449) Pushing down arbitrary logical plans to data sources

    [ https://issues.apache.org/jira/browse/SPARK-12449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16156622#comment-16156622 ] 

Andrew Ash commented on SPARK-12449:
------------------------------------

[~velvia] I'm not involved with the CatalystSource or SAP HANAVora, so can't comment on the direction that project is going right now.

However there is an effort to add a new Datasources V2 API happening at https://issues.apache.org/jira/browse/SPARK-15689 and on the email list right now that could grow to encompass the goals of this issue.

[~stephank85] if you are able to comment on SPARK-15689 your input would be very valuable to that API design.

> Pushing down arbitrary logical plans to data sources
> ----------------------------------------------------
>
>                 Key: SPARK-12449
>                 URL: https://issues.apache.org/jira/browse/SPARK-12449
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Stephan Kessler
>         Attachments: pushingDownLogicalPlans.pdf
>
>
> With the help of the DataSource API we can pull data from external sources for processing. Implementing interfaces such as {{PrunedFilteredScan}} allows to push down filters and projects pruning unnecessary fields and rows directly in the data source.
> However, data sources such as SQL Engines are capable of doing even more preprocessing, e.g., evaluating aggregates. This is beneficial because it would reduce the amount of data transferred from the source to Spark. The existing interfaces do not allow such kind of processing in the source.
> We would propose to add a new interface {{CatalystSource}} that allows to defer the processing of arbitrary logical plans to the data source. We have already shown the details at the Spark Summit 2015 Europe [https://spark-summit.org/eu-2015/events/the-pushdown-of-everything/]
> I will add a design document explaining details. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org