You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Vitaliy Pisarev <vi...@biocatch.com> on 2018/03/07 11:24:09 UTC

Do values adjacent to exploded columns get duplicated?

This is a fairly basic question but I did not find an answer to it anywhere
online:

Suppose I have the following data frame (a and b are column names):

a      |       b
---------------
1      |    [x1,x2,x3,x4] # this is an array


Now I explode column b and logically get:

a      |       b
---------------
1      |      x1
1      |      x2
1      |      x3
1      |      x4

Are the values in the adjacent columns *actually* duplicated?

Re: Do values adjacent to exploded columns get duplicated?

Posted by Anshul Sachdeva <sa...@gmail.com>.
All the columns except exploded column will be duplicated after explode. As
it joins all the value of exploded column list with other columns.

Hope it clears.

Regards
Ansh

On Mar 7, 2018 4:54 PM, "Vitaliy Pisarev" <vi...@biocatch.com>
wrote:

> This is a fairly basic question but I did not find an answer to it
> anywhere online:
>
> Suppose I have the following data frame (a and b are column names):
>
> a      |       b
> ---------------
> 1      |    [x1,x2,x3,x4] # this is an array
>
>
> Now I explode column b and logically get:
>
> a      |       b
> ---------------
> 1      |      x1
> 1      |      x2
> 1      |      x3
> 1      |      x4
>
> Are the values in the adjacent columns *actually* duplicated?
>
>