You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by issues solution <is...@gmail.com> on 2017/04/19 11:42:17 UTC
java.lang.java.lang.UnsupportedOperationException
Hi ,
somone can tell me why i get the folowing error with udf apply like udf
def replaceCempty(x):
if x is None :
return ""
else :
return x.encode('utf-8')
udf_replaceCempty = F.udf(replaceCempty,StringType())
dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in
colprocessing else udf_replaceCempty(F.col(i)).alias(i) for i in
dfTotaleNormalize52.columns])
java.lang.java.lang.UnsupportedOperationException
Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string])
??
regards
Re: java.lang.java.lang.UnsupportedOperationException
Posted by Nicholas Hakobian <ni...@rallyhealth.com>.
CDH 5.5 only provides Spark 1.5. Are you managing your pySpark install
separately?
For something like your example, you will get significantly better
performance using coalesce with a lit, like so:
from pyspark.sql.functions import lit, coalesce
def replace_empty(icol):
return coalesce(col(icol), lit("")).alias(icol)
and use it similarly to what you are doing (I would build a function around
your if logic, its easier to understand):
def _if_not_in_processing(icol):
return icol if (icol not in colprocessing) else replace_empty(icol)
dfTotaleNormalize53 = dfTotaleNormalize52.select([_if_not_in_processing(i) for
i in dfTotaleNormalize52.columns])
Otherwise there isn't anything obvious to me as to why it isn't working. If
you actually do have pySpark 1.5 and not 1.6 I know it handles UDF
registration differently.
Hope this helps.
Nicholas Szandor Hakobian, Ph.D.
Senior Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com
On Wed, Apr 19, 2017 at 5:13 AM, issues solution <is...@gmail.com>
wrote:
> Pyspark 1.6 On cloudera 5.5 (yearn)
>
> 2017-04-19 13:42 GMT+02:00 issues solution <is...@gmail.com>:
>
>> Hi ,
>> somone can tell me why i get the folowing error with udf apply like
>> udf
>>
>> def replaceCempty(x):
>> if x is None :
>> return ""
>> else :
>> return x.encode('utf-8')
>> udf_replaceCempty = F.udf(replaceCempty,StringType())
>>
>> dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in
>> colprocessing else udf_replaceCempty(F.col(i)).alias(i) for i in
>> dfTotaleNormalize52.columns])
>>
>>
>> java.lang.java.lang.UnsupportedOperationException
>>
>> Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string])
>>
>> ??
>> regards
>>
>>
>>
>>
>
Re: java.lang.java.lang.UnsupportedOperationException
Posted by issues solution <is...@gmail.com>.
Pyspark 1.6 On cloudera 5.5 (yearn)
2017-04-19 13:42 GMT+02:00 issues solution <is...@gmail.com>:
> Hi ,
> somone can tell me why i get the folowing error with udf apply like udf
>
> def replaceCempty(x):
> if x is None :
> return ""
> else :
> return x.encode('utf-8')
> udf_replaceCempty = F.udf(replaceCempty,StringType())
>
> dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in
> colprocessing else udf_replaceCempty(F.col(i)).alias(i) for i in
> dfTotaleNormalize52.columns])
>
>
> java.lang.java.lang.UnsupportedOperationException
>
> Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string])
>
> ??
> regards
>
>
>
>