You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Weiand, Markus" <ma...@bertelsmann.de> on 2022/11/18 09:31:14 UTC

pyspark read.csv() doesn't respect locale when reading float

Hello!

I want to read csv files with pyspark using (spark_session).read.csv().
There is a whole bunch of nice options, especially an option "locale", nut nonetheless a decimal comma instead of a decimal point is not understood when reading float/double input even when the locale is set to 'de-DE'. I am using spark 3.2.0.
Of course I can read the column as string and write my own float-reader, but this will be inefficient in python.
And a simple csv generated by Excel will have decimal commas if written in Germany (with German localized Excel).

Markus