You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2016/06/07 18:34:20 UTC
[jira] [Created] (SPARK-15807) Support varargs for
distinct/dropDuplicates in Dataset/DataFrame
Dongjoon Hyun created SPARK-15807:
-------------------------------------
Summary: Support varargs for distinct/dropDuplicates in Dataset/DataFrame
Key: SPARK-15807
URL: https://issues.apache.org/jira/browse/SPARK-15807
Project: Spark
Issue Type: Improvement
Components: SQL
Reporter: Dongjoon Hyun
This issue adds `varargs`-types `distinct/dropDuplicates` functions in `Dataset/DataFrame`. Currently, `distinct` does not get arguments, and `dropDuplicates` supports only `Seq` or `Array`.
{code}
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]
scala> ds.dropDuplicates(Seq("_1", "_2"))
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: int]
scala> ds.dropDuplicates("_1", "_2")
<console>:26: error: overloaded method value dropDuplicates with alternatives:
(colNames: Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
(colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] <and>
()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
cannot be applied to (String, String)
ds.dropDuplicates("_1", "_2")
^
scala> ds.distinct("_1", "_2")
<console>:26: error: too many arguments for method distinct: ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
ds.distinct("_1", "_2")
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org