You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SamPenrose <sp...@mozilla.com> on 2016/11/12 00:36:25 UTC
pyspark: accept unicode column names in DataFrame.corr and cov
The corr() and cov() methods of DataFrame require an instance of str for
column names:
.
https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L1356
although instances of basestring appear to work for addressing columns:
.
https://github.com/apache/spark/blob/master/python/pyspark/sql/dataframe.py#L708
Humble request: could we replace the "isinstance(col1, str)" tests with
"isinstance(col1, basestring)"?
Less humble request: why test types at all? Why not just do one of {raise
KeyError, coerce to string}?
Cheers,
Sam
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/pyspark-accept-unicode-column-names-in-DataFrame-corr-and-cov-tp28065.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org
Re: pyspark: accept unicode column names in DataFrame.corr and cov
Posted by Hyukjin Kwon <gu...@gmail.com>.
Hi Sam,
I think I have some answers for two questions.
> Humble request: could we replace the "isinstance(col1, str)" tests with
"isinstance(col1, basestring)"?
IMHO, yes, I believe this should be basestring. Otherwise, some functions
would not accept unicode as arguments for columns in Python 2.7.
> Less humble request: why test types at all? Why not just do one of {raise
KeyError, coerce to string}?
I believe argument type checking is pretty common in other Python libraries
too such as numpy.
ValueError might be more appropriate because the type of the value is not
correct.
Also, I think forcing it into string might confuse user.
If the current why is problematic and not coherent, I guess you should
change this but I think
it is okay as it is.
Thanks.
2016-11-12 9:36 GMT+09:00 SamPenrose <sp...@mozilla.com>:
> The corr() and cov() methods of DataFrame require an instance of str for
> column names:
>
> .
> https://github.com/apache/spark/blob/master/python/
> pyspark/sql/dataframe.py#L1356
>
> although instances of basestring appear to work for addressing columns:
>
> .
> https://github.com/apache/spark/blob/master/python/
> pyspark/sql/dataframe.py#L708
>
> Humble request: could we replace the "isinstance(col1, str)" tests with
> "isinstance(col1, basestring)"?
>
> Less humble request: why test types at all? Why not just do one of {raise
> KeyError, coerce to string}?
>
> Cheers,
> Sam
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/pyspark-accept-unicode-column-names-
> in-DataFrame-corr-and-cov-tp28065.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>