You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by issues solution <is...@gmail.com> on 2017/04/19 11:42:17 UTC

java.lang.java.lang.UnsupportedOperationException

Hi ,
somone can tell  me why i get the folowing error  with udf apply  like udf

def replaceCempty(x):
    if x is None :
        return ""
    else :
        return x.encode('utf-8')
udf_replaceCempty = F.udf(replaceCempty,StringType())

dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in
colprocessing  else  udf_replaceCempty(F.col(i)).alias(i) for i in
dfTotaleNormalize52.columns])


java.lang.java.lang.UnsupportedOperationException

 Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string])

??
regards

Re: java.lang.java.lang.UnsupportedOperationException

Posted by Nicholas Hakobian <ni...@rallyhealth.com>.

CDH 5.5 only provides Spark 1.5. Are you managing your pySpark install
separately?

For something like your example, you will get significantly better
performance using coalesce with a lit, like so:

from pyspark.sql.functions import lit, coalesce

def replace_empty(icol):
    return coalesce(col(icol), lit("")).alias(icol)

and use it similarly to what you are doing (I would build a function around
your if logic, its easier to understand):

def _if_not_in_processing(icol):
    return icol if (icol not in colprocessing) else replace_empty(icol)

dfTotaleNormalize53 = dfTotaleNormalize52.select([_if_not_in_processing(i) for
i in dfTotaleNormalize52.columns])

Otherwise there isn't anything obvious to me as to why it isn't working. If
you actually do have pySpark 1.5 and not 1.6 I know it handles UDF
registration differently.

Hope this helps.

Nicholas Szandor Hakobian, Ph.D.
Senior Data Scientist
Rally Health
nicholas.hakobian@rallyhealth.com

On Wed, Apr 19, 2017 at 5:13 AM, issues solution <is...@gmail.com>
wrote:

> Pyspark 1.6  On cloudera 5.5  (yearn)
>
> 2017-04-19 13:42 GMT+02:00 issues solution <is...@gmail.com>:
>
>> Hi ,
>> somone can tell  me why i get the folowing error  with udf apply  like
>> udf
>>
>> def replaceCempty(x):
>>     if x is None :
>>         return ""
>>     else :
>>         return x.encode('utf-8')
>> udf_replaceCempty = F.udf(replaceCempty,StringType())
>>
>> dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in
>> colprocessing  else  udf_replaceCempty(F.col(i)).alias(i) for i in
>> dfTotaleNormalize52.columns])
>>
>>
>> java.lang.java.lang.UnsupportedOperationException
>>
>>  Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string])
>>
>> ??
>> regards
>>
>>
>>
>>
>

Re: java.lang.java.lang.UnsupportedOperationException

Posted by issues solution <is...@gmail.com>.

Pyspark 1.6  On cloudera 5.5  (yearn)

2017-04-19 13:42 GMT+02:00 issues solution <is...@gmail.com>:

> Hi ,
> somone can tell  me why i get the folowing error  with udf apply  like udf
>
> def replaceCempty(x):
>     if x is None :
>         return ""
>     else :
>         return x.encode('utf-8')
> udf_replaceCempty = F.udf(replaceCempty,StringType())
>
> dfTotaleNormalize53 = dfTotaleNormalize52.select([i if i not in
> colprocessing  else  udf_replaceCempty(F.col(i)).alias(i) for i in
> dfTotaleNormalize52.columns])
>
>
> java.lang.java.lang.UnsupportedOperationException
>
>  Cannot evaluate expression: PythonUDF#replaceCempty(input[77,string])
>
> ??
> regards
>
>
>
>