You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jason Moore (JIRA)" <ji...@apache.org> on 2017/05/02 08:12:04 UTC
[jira] [Commented] (SPARK-20411) New features for
expression.scalalang.typed
[ https://issues.apache.org/jira/browse/SPARK-20411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992517#comment-15992517 ]
Jason Moore commented on SPARK-20411:
-------------------------------------
And, ideally, anything else within org.apache.spark.sql.functions (e.g. countDistinct). We're looking to replace our use of DataFrames with Datasets, which means finding a replacement for all the aggregation functions that we use. If I end up putting together some functions myself, I'll pop back here to contribute them.
> New features for expression.scalalang.typed
> -------------------------------------------
>
> Key: SPARK-20411
> URL: https://issues.apache.org/jira/browse/SPARK-20411
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 2.0.0, 2.0.1, 2.1.0
> Reporter: Loic Descotte
> Priority: Minor
>
> In Spark 2 it is possible to use typed expressions for aggregation methods:
> {code}
> import org.apache.spark.sql.expressions.scalalang._
> dataset.groupByKey(_.productId).agg(typed.sum[Token](_.score)).toDF("productId", "sum").orderBy('productId).show
> {code}
> It seems that only avg, count and sum are defined : https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/expressions/scalalang/typed.html
> It is very nice to be able to use a typesafe DSL, but it would be good to have more possibilities, like min and max functions.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org