You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Franklyn D'souza <fr...@shopify.com> on 2016/06/07 21:47:46 UTC

Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
>
>
> ./dev/change-version-to-2.10.sh
> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5
> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive


and then ran the following code in a pyspark shell

from pyspark.sql import SparkSession
> from pyspark.sql.types import IntegerType, StructField, StructType
> from pyspark.sql.functions import udf
> from pyspark.sql.types import Row
> spark = SparkSession.builder.master('local[4]').appName('2.0
> DF').getOrCreate()
> add_one = udf(lambda x: x + 1, IntegerType())
> schema = StructType([StructField('a', IntegerType(), False)])
> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema)
> df.select(add_one(df.a).alias('incremented')).collect()


This never returns with a result.

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

Posted by Ted Yu <yu...@gmail.com>.
Please go ahead.

On Tue, Jun 7, 2016 at 4:45 PM, franklyn <fr...@shopify.com>
wrote:

> Thanks for reproducing it Ted, should i make a Jira Issue?.
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataframes-in-spark-2-0-preview-scala-2-10-tp17845p17852.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

Posted by franklyn <fr...@shopify.com>.
Thanks for reproducing it Ted, should i make a Jira Issue?.



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataframes-in-spark-2-0-preview-scala-2-10-tp17845p17852.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

Posted by Ted Yu <yu...@gmail.com>.
I built with Scala 2.10

>>> df.select(add_one(df.a).alias('incremented')).collect()

The above just hung.

On Tue, Jun 7, 2016 at 3:31 PM, franklyn <fr...@shopify.com>
wrote:

> Thanks Ted !.
>
> I'm using
>
> https://github.com/apache/spark/commit/8f5a04b6299e3a47aca13cbb40e72344c0114860
> and building with scala-2.10
>
> I can confirm that it works with scala-2.11
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataframes-in-spark-2-0-preview-scala-2-10-tp17845p17847.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

Posted by franklyn <fr...@shopify.com>.
Thanks Ted !.

I'm using
https://github.com/apache/spark/commit/8f5a04b6299e3a47aca13cbb40e72344c0114860
and building with scala-2.10

I can confirm that it works with scala-2.11



--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-t-use-UDFs-with-Dataframes-in-spark-2-0-preview-scala-2-10-tp17845p17847.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: Can't use UDFs with Dataframes in spark-2.0-preview scala-2.10

Posted by Ted Yu <yu...@gmail.com>.
With commit 200f01c8fb15680b5630fbd122d44f9b1d096e02 using Scala 2.11:

Using Python version 2.7.9 (default, Apr 29 2016 10:48:06)
SparkSession available as 'spark'.
>>> from pyspark.sql import SparkSession
>>> from pyspark.sql.types import IntegerType, StructField, StructType
>>> from pyspark.sql.functions import udf
>>> from pyspark.sql.types import Row
>>> spark = SparkSession.builder.master('local[4]').appName('2.0
DF').getOrCreate()
>>> add_one = udf(lambda x: x + 1, IntegerType())
>>> schema = StructType([StructField('a', IntegerType(), False)])
>>> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema)
>>> df.select(add_one(df.a).alias('incremented')).collect()
[Row(incremented=2), Row(incremented=3)]

Let me build with Scala 2.10 and try again.

On Tue, Jun 7, 2016 at 2:47 PM, Franklyn D'souza <
franklyn.dsouza@shopify.com> wrote:

> I've built spark-2.0-preview (8f5a04b) with scala-2.10 using the following
>>
>>
>> ./dev/change-version-to-2.10.sh
>> ./dev/make-distribution.sh -DskipTests -Dzookeeper.version=3.4.5
>> -Dcurator.version=2.4.0 -Dscala-2.10 -Phadoop-2.6  -Pyarn -Phive
>
>
> and then ran the following code in a pyspark shell
>
> from pyspark.sql import SparkSession
>> from pyspark.sql.types import IntegerType, StructField, StructType
>> from pyspark.sql.functions import udf
>> from pyspark.sql.types import Row
>> spark = SparkSession.builder.master('local[4]').appName('2.0
>> DF').getOrCreate()
>> add_one = udf(lambda x: x + 1, IntegerType())
>> schema = StructType([StructField('a', IntegerType(), False)])
>> df = spark.createDataFrame([Row(a=1),Row(a=2)], schema)
>> df.select(add_one(df.a).alias('incremented')).collect()
>
>
> This never returns with a result.
>
>
>