You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by 한승후 <si...@naver.com> on 2022/12/05 04:50:02 UTC
How can I use backticks in column names?
Spark throws an exception if there are backticks in the column name.
Please help me.
Re: How can I use backticks in column names?
Posted by Bjørn Jørgensen <bj...@gmail.com>.
df = spark.createDataFrame(
[("china", "asia"), ("colombia", "south america`")],
["country", "continent`"]
)
df.show()
+--------+--------------+
| country| continent`|
+--------+--------------+
| china| asia|
|colombia|south america`|
+--------+--------------+
df.select("continent`").show(1)
(...)
AnalysisException: Syntax error in attribute name: continent`.
clean_df = df.toDF(*(c.replace('`', '_') for c in df.columns))
clean_df.show()
+--------+--------------+
| country| continent_|
+--------+--------------+
| china| asia|
|colombia|south america`|
+--------+--------------+
clean_df.select("continent_").show(2)
+--------------+
| continent_|
+--------------+
| asia|
|south america`|
+--------------+
Examples are from MungingData Avoiding Dots / Periods in PySpark
Column Names <https://mungingdata.com/pyspark/avoid-dots-periods-column-names/>
man. 5. des. 2022 kl. 06:56 skrev 한승후 <si...@naver.com>:
> Spark throws an exception if there are backticks in the column name.
>
> Please help me.
>
--
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge
+47 480 94 297