You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Colin Williams <co...@gmail.com> on 2018/11/22 02:25:49 UTC

Casting nested columns and updated nested struct fields.

Hello,

I'm currently trying to update the schema for a dataframe with nested
columns. I would either like to update the schema itself or cast the
column without having to explicitly select all the columns just to
cast one.

In regards to updating the schema it looks like I would probably need
to write a more complex map on the schema to find the StructFields I
want  to update and update them. I haven't found any examples of this
but it seems like there should be a simpler way to do it.

In regards to changing the column on the dataframe itself, using E.G.

val newDF = df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))

I end up with a new column named "existing.top.level.FIELD_NAME" at
the root level vs updating the nested column to the new type. Then has
anybody worked out how to both update nested column datatype and also
how to update the column type from the nested schema StructType? Are
there any easy ways to do this or is there a reason it is not trivial?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Casting nested columns and updated nested struct fields.

Posted by Colin Williams <co...@gmail.com>.
Looks like it's been reported already. It's too bad it's been a year
but should be released into spark 3:
https://issues.apache.org/jira/browse/SPARK-22231
On Fri, Nov 23, 2018 at 8:42 AM Colin Williams
<co...@gmail.com> wrote:
>
> Seems like it's worthy of filing a bug against withColumn
>
> On Wed, Nov 21, 2018, 6:25 PM Colin Williams <colin.williams.seattle@gmail.com wrote:
>>
>> Hello,
>>
>> I'm currently trying to update the schema for a dataframe with nested
>> columns. I would either like to update the schema itself or cast the
>> column without having to explicitly select all the columns just to
>> cast one.
>>
>> In regards to updating the schema it looks like I would probably need
>> to write a more complex map on the schema to find the StructFields I
>> want  to update and update them. I haven't found any examples of this
>> but it seems like there should be a simpler way to do it.
>>
>> In regards to changing the column on the dataframe itself, using E.G.
>>
>> val newDF = df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))
>>
>> I end up with a new column named "existing.top.level.FIELD_NAME" at
>> the root level vs updating the nested column to the new type. Then has
>> anybody worked out how to both update nested column datatype and also
>> how to update the column type from the nested schema StructType? Are
>> there any easy ways to do this or is there a reason it is not trivial?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: Casting nested columns and updated nested struct fields.

Posted by Colin Williams <co...@gmail.com>.
Seems like it's worthy of filing a bug against withColumn

On Wed, Nov 21, 2018, 6:25 PM Colin Williams <
colin.williams.seattle@gmail.com wrote:

> Hello,
>
> I'm currently trying to update the schema for a dataframe with nested
> columns. I would either like to update the schema itself or cast the
> column without having to explicitly select all the columns just to
> cast one.
>
> In regards to updating the schema it looks like I would probably need
> to write a more complex map on the schema to find the StructFields I
> want  to update and update them. I haven't found any examples of this
> but it seems like there should be a simpler way to do it.
>
> In regards to changing the column on the dataframe itself, using E.G.
>
> val newDF =
> df.withColumn("existing.top.level.FIELD_NAME",df.col("existing.top.level.FIELD_NAME").cast(LongType))
>
> I end up with a new column named "existing.top.level.FIELD_NAME" at
> the root level vs updating the nested column to the new type. Then has
> anybody worked out how to both update nested column datatype and also
> how to update the column type from the nested schema StructType? Are
> there any easy ways to do this or is there a reason it is not trivial?
>