You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Herman van Hovell (JIRA)" <ji...@apache.org> on 2018/04/10 10:37:00 UTC
[jira] [Commented] (SPARK-23945) Column.isin() should accept a
single-column DataFrame as input
[ https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432039#comment-16432039 ]
Herman van Hovell commented on SPARK-23945:
-------------------------------------------
[~nchammas] we didn't add explicit dataset support because no-one asked for it, until now :)....
What do you want to support here? {{(NOT) IN}} and {{EXISTS}}? Or do you also want to add support for scalar subqueries, and subqueries in filters?
> Column.isin() should accept a single-column DataFrame as input
> --------------------------------------------------------------
>
> Key: SPARK-23945
> URL: https://issues.apache.org/jira/browse/SPARK-23945
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.3.0
> Reporter: Nicholas Chammas
> Priority: Minor
>
> In SQL you can filter rows based on the result of a subquery:
> {code:java}
> SELECT *
> FROM table1
> WHERE name NOT IN (
> SELECT name
> FROM table2
> );{code}
> In the Spark DataFrame API, the equivalent would probably look like this:
> {code:java}
> (table1
> .where(
> ~col('name').isin(
> table2.select('name')
> )
> )
> ){code}
> However, .isin() currently [only accepts a local list of values|http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.isin].
> I imagine making this enhancement would happen as part of a larger effort to support correlated subqueries in the DataFrame API.
> Or perhaps there is no plan to support this style of query in the DataFrame API, and queries like this should instead be written in a different way? How would we write a query like the one I have above in the DataFrame API, without needing to collect values locally for the NOT IN filter?
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org