You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by angelini <al...@shopify.com> on 2015/07/30 20:49:08 UTC

[Parquet + Dataframes] Column names with spaces

Hi all,

Our data has lots of human readable column names (names that include
spaces), is it possible to use these with Parquet and Dataframes?

When I try and write the Dataframe I get the following error:

(I am using PySpark)

`AnalysisException: Attribute name "Name with Space" contains invalid
character(s) among " ,;{}()\n\t=". Please use alias to rename it.`

How can I alias that column name?

`df['Name with Space'] = df['Name with Space'].alias('Name')` doesn't work
as you can't assign to a dataframe column.

`df.withColumnRenamed('Name with Space', 'Name')` overwrites the column and
doesn't alias it.

Any ideas?

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-Dataframes-Column-names-with-spaces-tp24088.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: [Parquet + Dataframes] Column names with spaces

Posted by Michael Armbrust <mi...@databricks.com>.
You can't use these names due to limitations in parquet (and the library it
self with silently generate corrupt files that can't be read, hence the
error we throw).

You can alias a column by df.select(df("old").alias("new")), which is
essential what withColumnRenamed does.  Alias in this case means renaming.

On Thu, Jul 30, 2015 at 11:49 AM, angelini <al...@shopify.com>
wrote:

> Hi all,
>
> Our data has lots of human readable column names (names that include
> spaces), is it possible to use these with Parquet and Dataframes?
>
> When I try and write the Dataframe I get the following error:
>
> (I am using PySpark)
>
> `AnalysisException: Attribute name "Name with Space" contains invalid
> character(s) among " ,;{}()\n\t=". Please use alias to rename it.`
>
> How can I alias that column name?
>
> `df['Name with Space'] = df['Name with Space'].alias('Name')` doesn't work
> as you can't assign to a dataframe column.
>
> `df.withColumnRenamed('Name with Space', 'Name')` overwrites the column and
> doesn't alias it.
>
> Any ideas?
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Parquet-Dataframes-Column-names-with-spaces-tp24088.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>