You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2019/04/04 16:01:00 UTC

[jira] [Commented] (SPARK-27353) PySpark Row __repr__ bug

    [ https://issues.apache.org/jira/browse/SPARK-27353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810009#comment-16810009 ] 

Bryan Cutler commented on SPARK-27353:
--------------------------------------

Works for me out of master, can you provide a script to reproduce?

In [1]: from pyspark.sql.types import Row                                                                                    

In [2]: import datetime                                                                                                      

In [3]: Row(d=datetime.date.today())                                                                                         
Out[3]: Row(d=datetime.date(2019, 4, 4))

In [4]: repr(Row(d=datetime.date.today()))                                                                                   
Out[4]: 'Row(d=datetime.date(2019, 4, 4))'

> PySpark  Row  __repr__ bug
> --------------------------
>
>                 Key: SPARK-27353
>                 URL: https://issues.apache.org/jira/browse/SPARK-27353
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Ihor Bobak
>            Priority: Major
>
> Row class has this implementation of __repr__:
>     def __repr__(self):
>         """Printable representation of Row used in Python REPL."""
>         if hasattr(self, "__fields__"):
>             return "Row(%s)" % ", ".join("%s=%r" % (k, v)
>                                          for k, v in zip(self.__fields__, tuple(self)))
>         else:
>             return "<Row(%s)>" % ", ".join(self)
>  
> the last line fails when you have a datetime.date instance in a row:
> TypeError                                 Traceback (most recent call last)
> <ipython-input-41-02c2f5a33c6e> in <module>
>       2     print(*row.values)
>       3     df_row = Row(*row.values)
> ----> 4     print(repr(df_row))
>       5     break
>       6 
> E:\spark\spark-2.3.2-bin-without-hadoop\python\pyspark\sql\types.py in __repr__(self)
>    1579                                          for k, v in zip(self.__fields__, tuple(self)))
>    1580         else:
> -> 1581             return "<Row(%s)>" % ", ".join(self)
>    1582 
>    1583 
> TypeError: sequence item 0: expected str instance, datetime.date found
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org