You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Manjunath Shetty H <ma...@live.com> on 2020/05/16 14:50:40 UTC

How to change Dataframe schema

Hi,

I have a dataframe with some columns and data that is fetched from JDBC, as i have to maintain the schema consistent in the ORC file i have to apply different schema for that dataframe. Column names will be same, but Data or Schema may contain some extra columns.

Is there any way i can apply the schema on top the existing Dataframe ?. Schema may be just doing the columns reordering in the most of the cases.

i have tried this     "

DataFrame dfNew = hc.createDataFrame(df.rdd(), ((StructType) DataType.fromJson(schema)));

"

But this will map the columns based on index and it will fail in case of columns reordering.

Any pointers will be helpful.

Thanks and Regards
Manjunath Shetty

Re: How to change Dataframe schema

Posted by Adi Polak <po...@gmail.com>.
Hi Manjunath,
Can you share the data example?
From the information shared above, it seems that you will need to apply
mapping with custom logic on the rows in your RDD to be consistent before
you can apply the schema.

I recommend reading about the mapping functionality here:
https://data-flair.training/blogs/apache-spark-map-vs-flatmap/

I hope it helps!

-Adi

On Sat, 16 May 2020 at 17:50, Manjunath Shetty H <ma...@live.com>
wrote:

> Hi,
>
> I have a dataframe with some columns and data that is fetched from JDBC,
> as i have to maintain the schema consistent in the ORC file i have to apply
> different schema for that dataframe. Column names will be same, but Data or
> Schema may contain some extra columns.
>
> Is there any way i can apply the schema on top the existing Dataframe ?.
> Schema may be just doing the columns reordering in the most of the cases.
>
> i have tried this     "
>
> DataFrame dfNew = hc.createDataFrame(df.rdd(), ((StructType) DataType.fromJson(schema)));
>
> "
>
> But this will map the columns based on index and it will fail in case of
> columns reordering.
>
> Any pointers will be helpful.
>
> Thanks and Regards
> Manjunath Shetty
>