You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@zeppelin.apache.org by Michael Borst <mi...@innovation.hrs.com> on 2016/09/19 09:58:44 UTC

Problems using UDFs with PySpark on 0.6.1 / Spark 2.0

Hi all,

I am experiencing problems with defining UDFs on Zeppelin 0.6.1 backed by
spark 2.0.

Code:

%pyspark
from pyspark.sql.functions import udf

wrapped_udf = udf(lambda x: x)

The call to udf fails with the following error message:

Traceback (most recent call last):
File "/var/folders/mr/797qcfdd0wd0vz51n4l0xmxh0000gn/T/zeppelin_pyspark-6894515524820639358.py",
line 266, in <module>
raise Exception(traceback.format_exc())
Exception: Traceback (most recent call last):
File "/var/folders/mr/797qcfdd0wd0vz51n4l0xmxh0000gn/T/zeppelin_pyspark-6894515524820639358.py",
line 264, in <module>
exec(code)
File "<stdin>", line 2, in <module>
File "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py",
line 1789, in udf
return UserDefinedFunction(f, returnType)
File "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py",
line 1751, in __init__
self._judf = self._create_judf(name)
File "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py",
line 1758, in _create_judf
jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
AttributeError: 'JavaMember' object has no attribute 'parseDataType'

The same code executed from a regular PySpark shell works fine. Anyone
having the same issues or able to provide help?

Regards

-- 

*Michael Borst*
Software Engineer

HRS Innovation Hub
Web <http://innovation.hrs.com/> • LinkedIn
<https://www.linkedin.com/company/hrs-innovation-hub> • Facebook
<https://www.facebook.com/hrsinnovation>

We're hiring! See openings <https://hrsinnovationhub.recruiterbox.com/>
Terms apply to this email: http://j.mp/email-tac

Re: Problems using UDFs with PySpark on 0.6.1 / Spark 2.0

Posted by Michael Borst <mi...@innovation.hrs.com>.
Thanks for the swift reply! Having the patch at least means it's not a
blocker anymore.

On Mon, Sep 19, 2016 at 9:18 PM, moon soo Lee <mo...@apache.org> wrote:

> Hi,
>
> Thanks for reporting the problem.
> The issue is being tracked by https://issues.apache.org/
> jira/browse/ZEPPELIN-1411 and patch is available at
> https://github.com/apache/zeppelin/pull/1404. Hope we can merge this
> patch to master soon.
>
> Thanks!
> moon
>
> On Mon, Sep 19, 2016 at 2:58 AM Michael Borst <
> michael.borst@innovation.hrs.com> wrote:
>
>> Hi all,
>>
>> I am experiencing problems with defining UDFs on Zeppelin 0.6.1 backed by
>> spark 2.0.
>>
>> Code:
>>
>> %pyspark
>> from pyspark.sql.functions import udf
>>
>> wrapped_udf = udf(lambda x: x)
>>
>> The call to udf fails with the following error message:
>>
>> Traceback (most recent call last):
>> File "/var/folders/mr/797qcfdd0wd0vz51n4l0xmxh0000gn/T/zeppelin_pyspark-6894515524820639358.py",
>> line 266, in <module>
>> raise Exception(traceback.format_exc())
>> Exception: Traceback (most recent call last):
>> File "/var/folders/mr/797qcfdd0wd0vz51n4l0xmxh0000gn/T/zeppelin_pyspark-6894515524820639358.py",
>> line 264, in <module>
>> exec(code)
>> File "<stdin>", line 2, in <module>
>> File "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py",
>> line 1789, in udf
>> return UserDefinedFunction(f, returnType)
>> File "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py",
>> line 1751, in __init__
>> self._judf = self._create_judf(name)
>> File "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py",
>> line 1758, in _create_judf
>> jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
>> AttributeError: 'JavaMember' object has no attribute 'parseDataType'
>>
>> The same code executed from a regular PySpark shell works fine. Anyone
>> having the same issues or able to provide help?
>>
>> Regards
>>
>> --
>>
>> *Michael Borst*
>> Software Engineer
>>
>> HRS Innovation Hub
>> Web <http://innovation.hrs.com/> • LinkedIn
>> <https://www.linkedin.com/company/hrs-innovation-hub> • Facebook
>> <https://www.facebook.com/hrsinnovation>
>>
>> We're hiring! See openings <https://hrsinnovationhub.recruiterbox.com/>
>> Terms apply to this email: http://j.mp/email-tac
>>
>


-- 

*Michael Borst*
Software Engineer

HRS Innovation Hub
Web <http://innovation.hrs.com/> • LinkedIn
<https://www.linkedin.com/company/hrs-innovation-hub> • Facebook
<https://www.facebook.com/hrsinnovation>

We're hiring! See openings <https://hrsinnovationhub.recruiterbox.com/>
Terms apply to this email: http://j.mp/email-tac

Re: Problems using UDFs with PySpark on 0.6.1 / Spark 2.0

Posted by moon soo Lee <mo...@apache.org>.
Hi,

Thanks for reporting the problem.
The issue is being tracked by
https://issues.apache.org/jira/browse/ZEPPELIN-1411 and patch is available
at https://github.com/apache/zeppelin/pull/1404. Hope we can merge this
patch to master soon.

Thanks!
moon

On Mon, Sep 19, 2016 at 2:58 AM Michael Borst <
michael.borst@innovation.hrs.com> wrote:

> Hi all,
>
> I am experiencing problems with defining UDFs on Zeppelin 0.6.1 backed by
> spark 2.0.
>
> Code:
>
> %pyspark
> from pyspark.sql.functions import udf
>
> wrapped_udf = udf(lambda x: x)
>
> The call to udf fails with the following error message:
>
> Traceback (most recent call last):
> File
> "/var/folders/mr/797qcfdd0wd0vz51n4l0xmxh0000gn/T/zeppelin_pyspark-6894515524820639358.py",
> line 266, in <module>
> raise Exception(traceback.format_exc())
> Exception: Traceback (most recent call last):
> File
> "/var/folders/mr/797qcfdd0wd0vz51n4l0xmxh0000gn/T/zeppelin_pyspark-6894515524820639358.py",
> line 264, in <module>
> exec(code)
> File "<stdin>", line 2, in <module>
> File
> "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py", line
> 1789, in udf
> return UserDefinedFunction(f, returnType)
> File
> "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py", line
> 1751, in __init__
> self._judf = self._create_judf(name)
> File
> "/usr/local/opt/apache-spark/libexec/python/pyspark/sql/functions.py", line
> 1758, in _create_judf
> jdt = ctx._ssql_ctx.parseDataType(self.returnType.json())
> AttributeError: 'JavaMember' object has no attribute 'parseDataType'
>
> The same code executed from a regular PySpark shell works fine. Anyone
> having the same issues or able to provide help?
>
> Regards
>
> --
>
> *Michael Borst*
> Software Engineer
>
> HRS Innovation Hub
> Web <http://innovation.hrs.com/> • LinkedIn
> <https://www.linkedin.com/company/hrs-innovation-hub> • Facebook
> <https://www.facebook.com/hrsinnovation>
>
> We're hiring! See openings <https://hrsinnovationhub.recruiterbox.com/>
> Terms apply to this email: http://j.mp/email-tac
>