You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by issues solution <is...@gmail.com> on 2017/04/13 09:03:06 UTC

checkpoint

Hi
I am newer in spark and i want ask you what wrang with checkpoint  On
pyspark 1.6.0

i dont unertsand what happen after i try to use it under datframe :
   dfTotaleNormalize24 =  dfTotaleNormalize23.select([i if i not in
listrapcot  else          udf_Grappra(F.col(i)).alias(i) for i in
dfTotaleNormalize23.columns  ])

dfTotaleNormalize24.cache()   <- cache on memory
dfTotaleNormalize24.count <-matrialize dataframe(  rdd too ??)
dfTotaleNormalize24.rdd.checkpoint() <- (cut DAG and save rdd not yet)
dfTotaleNormalize24.rdd.count() <--- matrialize in file

but why i get the following error :

 java.lang.UnsupportedOperationException: Cannot evaluate expression:
 PythonUDF#Grappra(input[410, StringType])


thank's to explain all details and steps to save and check point

Mydatframe it huge on with more than 5 Million rows and 1000 columns

and udf befor are applied on more than 150 columns  it replace  ' ' by 0.0
that all.

regards

Re: checkpoint

Posted by ayan guha <gu...@gmail.com>.

Looks like your udf expects numeric data but you are sending string type.
Suggest to cast to numeric.

On Thu, 13 Apr 2017 at 7:03 pm, issues solution <is...@gmail.com>
wrote:

> Hi
> I am newer in spark and i want ask you what wrang with checkpoint  On
> pyspark 1.6.0
>
> i dont unertsand what happen after i try to use it under datframe :
>    dfTotaleNormalize24 =  dfTotaleNormalize23.select([i if i not in
> listrapcot  else          udf_Grappra(F.col(i)).alias(i) for i in
> dfTotaleNormalize23.columns  ])
>
> dfTotaleNormalize24.cache()   <- cache on memory
> dfTotaleNormalize24.count <-matrialize dataframe(  rdd too ??)
> dfTotaleNormalize24.rdd.checkpoint() <- (cut DAG and save rdd not yet)
> dfTotaleNormalize24.rdd.count() <--- matrialize in file
>
> but why i get the following error :
>
>  java.lang.UnsupportedOperationException: Cannot evaluate expression:
>  PythonUDF#Grappra(input[410, StringType])
>
>
> thank's to explain all details and steps to save and check point
>
> Mydatframe it huge on with more than 5 Million rows and 1000 columns
>
> and udf befor are applied on more than 150 columns  it replace  ' ' by 0.0
> that all.
>
> regards
>
-- 
Best Regards,
Ayan Guha