You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Richard Cobbe <ri...@oracle.com> on 2016/02/18 17:31:53 UTC

UDAF support for DataFrames in Spark 1.5.0?

I'm working on an application using DataFrames (Scala API) in Spark 1.5.0,
and we need to define and use several custom aggregators.  I'm having
trouble figuring out how to do this, however.

First, which version of Spark did UDAF support land in?  Has it in fact
landed at all?

https://issues.apache.org/jira/browse/SPARK-3947 suggests that UDAFs should
be available in 1.5.0.  However, the associated pull request includes
classes like org.apache.spark.sql.UDAFRegistration, but these classes don't
appear in the API docs, and I'm not able to use them from the spark shell
("type UDAFRegistration is not a member of package org.apache.spark.sql").

I don't have access to a Spark 1.6.0 installation, but UDAFRegistration
doesn't appear in the Scaladoc pages for 1.6.

Second, assuming that this functionality is supported in some version of
Spark, could someone point me to some documentation or an example that
demonstrates how to define and use a custom aggregation function?

Many thanks,

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: UDAF support for DataFrames in Spark 1.5.0?

Posted by Richard Cobbe <ri...@oracle.com>.
On Thu, Feb 18, 2016 at 11:18:44PM +0000, Kabeer Ahmed wrote:

> I use Spark 1.5 with CDH5.5 distribution and I see that support is
> present for UDAF. From the link:
> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html,
> I read that this is an experimental feature. So it makes sense not to
> find this in the documentation.
>
> For confirmation whether it works in Spark 1.5 I quickly tried out the
> example in the link and it works. I hope this answers your question.

Excellent -- that example is very helpful.  I was able to implement one
of our custom aggregators with no problems.

Thanks very much for your help!

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: UDAF support for DataFrames in Spark 1.5.0?

Posted by Ted Yu <yu...@gmail.com>.
Richard:
Please see SPARK-9664 Use sqlContext.udf to register UDAFs

Cheers

On Thu, Feb 18, 2016 at 3:18 PM, Kabeer Ahmed <ka...@outlook.com>
wrote:

> I use Spark 1.5 with CDH5.5 distribution and I see that support is present
> for UDAF. From the link:
> https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html,
> I read that this is an experimental feature. So it makes sense not to find
> this in the documentation.
>
> For confirmation whether it works in Spark 1.5 I quickly tried out the
> example in the link and it works. I hope this answers your question.
>
> Kabeer.
>
> On 18/02/16 16:31, Richard Cobbe wrote:
>
> I'm working on an application using DataFrames (Scala API) in Spark 1.5.0,
> and we need to define and use several custom aggregators.  I'm having
> trouble figuring out how to do this, however.
>
> First, which version of Spark did UDAF support land in?  Has it in fact
> landed at all?
> https://issues.apache.org/jira/browse/SPARK-3947 suggests that UDAFs should
> be available in 1.5.0.  However, the associated pull request includes
> classes like org.apache.spark.sql.UDAFRegistration, but these classes don't
> appear in the API docs, and I'm not able to use them from the spark shell
> ("type UDAFRegistration is not a member of package org.apache.spark.sql").
>
> I don't have access to a Spark 1.6.0 installation, but UDAFRegistration
> doesn't appear in the Scaladoc pages for 1.6.
>
> Second, assuming that this functionality is supported in some version of
> Spark, could someone point me to some documentation or an example that
> demonstrates how to define and use a custom aggregation function?
>
> Many thanks,
>
> Richard
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>
>

Re: UDAF support for DataFrames in Spark 1.5.0?

Posted by Kabeer Ahmed <ka...@outlook.com>.
I use Spark 1.5 with CDH5.5 distribution and I see that support is present for UDAF. From the link: https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html, I read that this is an experimental feature. So it makes sense not to find this in the documentation.

For confirmation whether it works in Spark 1.5 I quickly tried out the example in the link and it works. I hope this answers your question.

Kabeer.

On 18/02/16 16:31, Richard Cobbe wrote:

I'm working on an application using DataFrames (Scala API) in Spark 1.5.0,
and we need to define and use several custom aggregators.  I'm having
trouble figuring out how to do this, however.

First, which version of Spark did UDAF support land in?  Has it in fact
landed at all?

https://issues.apache.org/jira/browse/SPARK-3947 suggests that UDAFs should
be available in 1.5.0.  However, the associated pull request includes
classes like org.apache.spark.sql.UDAFRegistration, but these classes don't
appear in the API docs, and I'm not able to use them from the spark shell
("type UDAFRegistration is not a member of package org.apache.spark.sql").

I don't have access to a Spark 1.6.0 installation, but UDAFRegistration
doesn't appear in the Scaladoc pages for 1.6.

Second, assuming that this functionality is supported in some version of
Spark, could someone point me to some documentation or an example that
demonstrates how to define and use a custom aggregation function?

Many thanks,

Richard

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>
For additional commands, e-mail: user-help@spark.apache.org<ma...@spark.apache.org>