You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohamed Samir <mo...@gmail.com> on 2021/12/14 22:38:01 UTC

spark.read.schema return null for dataframe column values

Hi,

I have small question and issue which I hope spark gurus to help me in

I have parquet file person.parquet that has multiple column with one row.
one of the column "Middle Name" has space which cause issue with spark when
writing it to parquet format
[image: image.png]

what i have done is to renaming the column to remove the space as below
SourceData = SourceData.withColumnRenamed("Middle Name","MiddleName")

if i tried to write SourceData to parquet file, it still returns error
Caused by: org.apache.spark.sql.AnalysisException: Attribute name "Middle
Name" contains invalid character(s) among " ,;{}()\n\t=". Please use alias
to rename it.

so i use below which solve the issue but unfortunately the file generated
has null value for column MiddleName
SourceData = spark.read.schema(SourceData.schema).parquet(TestingPath)

[image: image.png]

Any suggestion on how to solve this issue?