You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by pseudo oduesp <ps...@gmail.com> on 2016/06/08 12:05:15 UTC

comparaing row in pyspark data frame

Hi,
how we can compare multiples columns in datframe i mean

if  df it s dataframe like that :

                           df.col1 | df.col2 | df.col3
                           0.2          0.3      0.4

how we can compare columns to get max of row not columns and get name of
columns where max it present ?

thanks

Re: comparaing row in pyspark data frame

Posted by Jacek Laskowski <ja...@japila.pl>.

On Wed, Jun 8, 2016 at 2:05 PM, pseudo oduesp <ps...@gmail.com> wrote:

> how we can compare columns to get max of row not columns and get name of
> columns where max it present ?

First thought - a UDF.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark http://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: comparaing row in pyspark data frame

Posted by Ted Yu <yu...@gmail.com>.

Do you mean returning col3 and 0.4 for the example row below ?

> On Jun 8, 2016, at 5:05 AM, pseudo oduesp <ps...@gmail.com> wrote:
> 
> Hi,
> how we can compare multiples columns in datframe i mean 
> 
> if  df it s dataframe like that :
> 
>                            df.col1 | df.col2 | df.col3 
>                            0.2          0.3      0.4 
> 
> how we can compare columns to get max of row not columns and get name of columns where max it present ?
> 
> thanks

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org