You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by 한승후 <si...@naver.com> on 2022/12/05 04:50:02 UTC

How can I use backticks in column names?

Spark throws an exception if there are backticks in the column name.

Please help me.

Re: How can I use backticks in column names?

Posted by Bjørn Jørgensen <bj...@gmail.com>.

df = spark.createDataFrame(
    [("china", "asia"), ("colombia", "south america`")],
    ["country", "continent`"]
)
df.show()


+--------+--------------+
| country|    continent`|
+--------+--------------+
|   china|          asia|
|colombia|south america`|
+--------+--------------+



df.select("continent`").show(1)

(...)

AnalysisException: Syntax error in attribute name: continent`.



clean_df = df.toDF(*(c.replace('`', '_') for c in df.columns))
clean_df.show()


+--------+--------------+
| country|    continent_|
+--------+--------------+
|   china|          asia|
|colombia|south america`|
+--------+--------------+


clean_df.select("continent_").show(2)


+--------------+
|    continent_|
+--------------+
|          asia|
|south america`|
+--------------+


Examples are from MungingData Avoiding Dots / Periods in PySpark
Column Names <https://mungingdata.com/pyspark/avoid-dots-periods-column-names/>


man. 5. des. 2022 kl. 06:56 skrev 한승후 <si...@naver.com>:

> Spark throws an exception if there are backticks in the column name.
>
> Please help me.
>


-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297