You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Bode, Meikel, NMA-CFD" <Me...@Bertelsmann.de> on 2021/05/21 11:27:51 UTC

DF blank value fill

Hi all,

My df looks like follows:

Situation:
MainKey, SubKey, Val1, Val2, Val3, ...
1, 2, a, null, c
1, 2, null, null, c
1, 3, null, b, null
1, 3, a, null, c


Desired outcome:
1, 2, a, b, c
1, 2, a, b, c
1, 3, a, b, c
1, 3, a, b, c


How could I populate/synchronize empty cells of all records with the same combination of MainKey and SubKey with the respective value of other rows with the same key combination?
A certain value, if not null, of a col is guaranteed to be unique within the df. If a col exists then there is at least one row with a not-null value.

I am using pyspark.

Thanks for any hint,
Best
Meikel

Re: DF blank value fill

Posted by ayan guha <gu...@gmail.com>.

Hi

You can do something like this:

SELECT MainKey, Subkey,
              case when val1 is null then newval1 else val1 end val1,
              case when val2 is null then newval2 else val1 end val2,
              case when val3 is null then newval3 else val1 end val3
 from (select mainkey,subkey, val1,val2, val3,
                     first_value() over (partitionby mainkey, subkey order
by val1 nulls last) newval1,
                     first_value() over (partitionby mainkey, subkey order
by val2 nulls last) newval2,
                     first_value() over (partitionby mainkey, subkey order
by val3 nulls last) newval3
from table) x

On Fri, May 21, 2021 at 9:29 PM Bode, Meikel, NMA-CFD <
Meikel.Bode@bertelsmann.de> wrote:

> Hi all,
>
>
>
> My df looks like follows:
>
>
>
> Situation:
>
> MainKey, SubKey, Val1, Val2, Val3, …
>
> 1, 2, a, null, c
>
> 1, 2, null, null, c
>
> 1, 3, null, b, null
>
> 1, 3, a, null, c
>
>
>
>
>
> Desired outcome:
>
> 1, 2, a, b, c
>
> 1, 2, a, b, c
>
> 1, 3, a, b, c
>
> 1, 3, a, b, c
>
>
>
>
>
> How could I populate/synchronize empty cells of all records with the same
> combination of MainKey and SubKey with the respective value of other rows
> with the same key combination?
>
> A certain value, if not null, of a col is guaranteed to be unique within
> the df. If a col exists then there is at least one row with a not-null
> value.
>
>
>
> I am using pyspark.
>
>
>
> Thanks for any hint,
>
> Best
>
> Meikel
>


-- 
Best Regards,
Ayan Guha