You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2018/09/09 04:32:00 UTC
[jira] [Commented] (SPARK-25072) PySpark custom Row class can be given extra parameters

    [ https://issues.apache.org/jira/browse/SPARK-25072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16608292#comment-16608292 ] 

Apache Spark commented on SPARK-25072:
--------------------------------------

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/22369

> PySpark custom Row class can be given extra parameters
> ------------------------------------------------------
>
>                 Key: SPARK-25072
>                 URL: https://issues.apache.org/jira/browse/SPARK-25072
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.2.0
>         Environment: {noformat}
> SPARK_MAJOR_VERSION is set to 2, using Spark2
> Python 3.4.5 (default, Dec 11 2017, 16:57:19)
> Type 'copyright', 'credits' or 'license' for more information
> IPython 6.2.1 -- An enhanced Interactive Python. Type '?' for help.
> Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
> 18/08/01 04:49:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> 18/08/01 04:49:17 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
> 18/08/01 04:49:27 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
> Welcome to
>       ____              __
>      / __/__  ___ _____/ /__
>     _\ \/ _ \/ _ `/ __/  '_/
>    /__ / .__/\_,_/_/ /_/\_\   version 2.2.0
>       /_/
> Using Python version 3.4.5 (default, Dec 11 2017 16:57:19)
> SparkSession available as 'spark'.
> {noformat}
> {{CentOS release 6.9 (Final)}}
> {{Linux sandbox-hdp.hortonworks.com 4.14.0-1.el7.elrepo.x86_64 #1 SMP Sun Nov 12 20:21:04 EST 2017 x86_64 x86_64 x86_64 GNU/Linux}}
> {noformat}openjdk version "1.8.0_161"
> OpenJDK Runtime Environment (build 1.8.0_161-b14)
> OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode){noformat}
>            Reporter: Jan-Willem van der Sijp
>            Assignee: Li Yuanjian
>            Priority: Minor
>             Fix For: 2.3.2, 2.4.0, 3.0.0
>
>
> When a custom Row class is made in PySpark, it is possible to provide the constructor of this class with more parameters than there are columns. These extra parameters affect the value of the Row, but are not part of the {{repr}} or {{str}} output, making it hard to debug errors due to these "invisible" values. The hidden values can be accessed through integer-based indexing though.
> Some examples:
> {code:python}
> In [69]: RowClass = Row("column1", "column2")
> In [70]: RowClass(1, 2) == RowClass(1, 2)
> Out[70]: True
> In [71]: RowClass(1, 2) == RowClass(1, 2, 3)
> Out[71]: False
> In [75]: RowClass(1, 2, 3)
> Out[75]: Row(column1=1, column2=2)
> In [76]: RowClass(1, 2)
> Out[76]: Row(column1=1, column2=2)
> In [77]: RowClass(1, 2, 3).asDict()
> Out[77]: {'column1': 1, 'column2': 2}
> In [78]: RowClass(1, 2, 3)[2]
> Out[78]: 3
> In [79]: repr(RowClass(1, 2, 3))
> Out[79]: 'Row(column1=1, column2=2)'
> In [80]: str(RowClass(1, 2, 3))
> Out[80]: 'Row(column1=1, column2=2)'
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org