You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chetan Khatri <ch...@gmail.com> on 2020/03/28 02:35:51 UTC

Best Practice: Evaluate Expression from Spark DataFrame Column

Hi Spark Users,

I want to evaluate expression from dataframe column values on other columns
in the same dataframe for each row. Please suggest best approach to deal
with this given that not impacting the performance of the job.

Thanks

Sample code:

val sampleDF = Seq(
  (8, 1, "bat", "NUM IS NOT NULL AND FLAG IS NOT 0"),
  (64, 0, "mouse", "NUM IS NOT NULL AND FLAG IS NOT 0"),
  (-27, 1, "horse" , "NUM IS NOT NULL AND FLAG IS NOT 0"),
  (null, 0, "miki", "NUM IS NOT NULL AND FLAG IS NOT 1 AND WORD IS 'MIKI'")
).toDF("num", "flag", "word", "expression")

val derivedDF = sampleDF.withColumn("status", sampleDF.col("expression"))

Re: Best Practice: Evaluate Expression from Spark DataFrame Column

Posted by Chetan Khatri <ch...@gmail.com>.
Is there a way to pass column as a String to expr function in spark?

val sampleDF = Seq(
  (8, 1, "bat", "NUM IS NOT NULL AND FLAG!=0"),
  (64, 0, "mouse", "NUM IS NOT NULL AND FLAG!=0"),
  (-27, 1, "horse" , "NUM IS NOT NULL AND FLAG!=0"),
  (1, 0, "miki", "NUM IS NOT NULL AND FLAG!=0 AND WORD == 'MIKI'")
).toDF("num", "flag", "word", "expression")

val derivedDF = sampleDF.withColumn("status",
expr(sampleDF.col("expression").as[String].toString()))


On Fri, Mar 27, 2020 at 10:35 PM Chetan Khatri <ch...@gmail.com>
wrote:

> Hi Spark Users,
>
> I want to evaluate expression from dataframe column values on other
> columns in the same dataframe for each row. Please suggest best approach to
> deal with this given that not impacting the performance of the job.
>
> Thanks
>
> Sample code:
>
> val sampleDF = Seq(
>   (8, 1, "bat", "NUM IS NOT NULL AND FLAG IS NOT 0"),
>   (64, 0, "mouse", "NUM IS NOT NULL AND FLAG IS NOT 0"),
>   (-27, 1, "horse" , "NUM IS NOT NULL AND FLAG IS NOT 0"),
>   (null, 0, "miki", "NUM IS NOT NULL AND FLAG IS NOT 1 AND WORD IS 'MIKI'")
> ).toDF("num", "flag", "word", "expression")
>
> val derivedDF = sampleDF.withColumn("status", sampleDF.col("expression"))
>
>