You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/02/16 11:07:18 UTC

[jira] [Resolved] (SPARK-2369) Enable Spark SQL UDF to influence at runtime the decision to read a partition

     [ https://issues.apache.org/jira/browse/SPARK-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-2369.
------------------------------
    Resolution: Won't Fix

> Enable Spark SQL UDF to influence at runtime the decision to read a partition
> -----------------------------------------------------------------------------
>
>                 Key: SPARK-2369
>                 URL: https://issues.apache.org/jira/browse/SPARK-2369
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 1.0.0
>            Reporter: Mansour Raad
>              Labels: UDF
>
> Let's say I have a custom partitioner on my RDD - and that RDD is registered as a SQL table and want to do a "select myfield from mytable where myudf(myfield,"some condition") = somevalue - I do not want to perform a "full table" scan to get myfield.
> However, if the UDF API is extended to say at runtime "ask" where the current partition is "valid" - then it will scan it.
> I see the UDF API been modified with a method such as:
> readPartition(partitioner:Partitioner, partitionId:int):Boolean
> where I can cast partitioner to my own custom one and based on the given partition id and runtime arguments, the method will decide to read that partition



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org