You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by marc nicole <mk...@gmail.com> on 2023/04/30 23:01:40 UTC

How to change column values using several when conditions ?

Hello to you Sparkling community :)

I want to change values of a column in a dataset according to a mapping
list that maps original values of that column to other new values. Each
element of the list (colMappingValues) is a string that separates the
original values from the new values using a ";".

So for a given column (in the following example colName), I do the
following processing to alter the column values as described:

for (i=0;i<colMappingValues.size();i++){
>
>     //below lists contains all distinct values of a column
> (colMappingValues[i]) and their target values)
>     allValuesChanges = colMappingValues[i].toString().split(";", 2);
>
>      dataset  = dataset.withColumn(colName,
> when(dataset.col(colName).equalTo(allValuesChanges[0])),allValuesChanges[1]).otherwise(dataset.col(colName));

}

which is working but I want it to be efficient to avoid unnecessary
iterations. Meaning that I want when the column doesn't contain the value
from the list, the call to withColumn() gets ignored.
How to do exactly that in a more efficient way using Spark in Java?

Thanks.