You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Somasundaram Sekar <so...@tigeranalytics.com> on 2017/10/08 11:59:42 UTC

Equivalent of Redshift ListAgg function in Spark (Pyspak)

Hi,



I want to concat multiple columns into a single column after grouping the
 DataFrame,



I want an functional equivalent of Redshift ListAgg function



pg_catalog.Listagg(column, '|')

         within GROUP( ORDER BY column) AS

name


LISTAGG Function

: For each group in a query, the LISTAGG aggregate function orders the rows
for that group according to the ORDER BY expression, then concatenates the
values into a single string.

RE: Equivalent of Redshift ListAgg function in Spark (Pyspak)

Posted by Mahesh Sawaiker <ma...@persistent.com>.
After doing group, you can use mkstring on the data frame. Following is an example where are columns are concatenated with space as a separator.



scala> call_cdf.map(row => row.mkString(" ")).show(false)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                                                                                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1 AAAAAAAABAAAAAAA 1998-01-01 null null 2450997 NY Metro large 2325 1374075 8AM-12AM Keith Cunningham 4 Matters may hear as; profita New, cold plants can put al Dante Cook 3 pri 4 ese 995 Park 3rd Dr. Suite 470 Five Points Ziebach County SD 56098 United States -6.0 0.02                                                                                                       |
|2 AAAAAAAACAAAAAAA 1998-01-01 2000-12-31 null 2450876 Mid Atlantic large 4208 837392 8AM-4PM Stephen Clem 3 Classes devote largely other, standard ter Free germans prove flatly industrial drugs. Low questions come to a equations. British, conservative Christopher Perez 6 cally 3 pri 245 Johnson  Circle Suite 200 Fairview Williamson County TN 35709 United States -5.0 0.03|
|3 AAAAAAAACAAAAAAA 2001-01-01 null null 2450876 Mid Atlantic small 3251 837392 8AM-4PM William Johnson 3 Classes devote largely other, standard ter Ridiculous requirements must not implement about pure values. Substances know powers. Political rel Derrick Burke 6 cally 3 pri 245 Johnson  Circle Suite 200 Fairview Williamson County TN 35709 United States -5.0 0.03        |
|4 AAAAAAAAEAAAAAAA 1998-01-01 2000-01-01 null 2450872 North Midwest large 2596 708708 8AM-4PM Lamont Greene 3 Events must find anyway  Great rates must ensure famous, other banks. As main goals get home as a  Marvin Dean 2 able 2 able 927 Oak Main ST Suite 150 Five Points Williamson County TN 36098 United States -5.0 0.03                                                  |
|5 AAAAAAAAEAAAAAAA 2000-01-02 2001-12-31 null 2450872 North Midwest medium 2596 708708 8AM-12AM Lamont Greene 3 Events must find anyway  So fresh supplies keep meanwhile religious, labour years. Rapid, careful subject Matthew Williams 2 able 1 able 927 Oak Main ST Suite 150 Five Points Williamson County TN 36098 United States -5.0 0.0                                     |
|6 AAAAAAAAEAAAAAAA 2002-01-01 null null 2450872 North Midwest small 2596 708708 8AM-4PM Emilio Romano 6 As well novel sentences check through the plans. Sophisticated cities fall for e William Johnson 5 anti 1 able 927 Oak Main ST Suite 150 Five Points Williamson County TN 36098 United States -5.0 0.07                                                                      |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+


+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+
|cc_call_center_sk|cc_call_center_id|cc_rec_start_date|cc_rec_end_date|cc_closed_date_sk|cc_open_date_sk|      cc_name|cc_class|cc_employees|cc_sq_ft|cc_hours|      cc_manager|cc_mkt_id|        cc_mkt_class|         cc_mkt_desc|cc_market_manager|cc_division|cc_division_name|cc_company|cc_company_name|cc_street_number|cc_street_name|cc_street_type|cc_suite_number|    cc_city|        cc_county|cc_state|cc_zip|   cc_country|cc_gmt_offset|cc_tax_percentage|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+
|                1| AAAAAAAABAAAAAAA|       1998-01-01|           null|             null|        2450997|     NY Metro|   large|        2325| 1374075|8AM-12AM|Keith Cunningham|        4|Matters may hear ...|New, cold plants ...|       Dante Cook|          3|             pri|         4|            ese|             995|      Park 3rd|           Dr.|      Suite 470|Five Points|   Ziebach County|      SD| 56098|United States|         -6.0|             0.02|
|                2| AAAAAAAACAAAAAAA|       1998-01-01|     2000-12-31|             null|        2450876| Mid Atlantic|   large|        4208|  837392| 8AM-4PM|    Stephen Clem|        3|Classes devote la...|Free germans prov...|Christopher Perez|          6|           cally|         3|            pri|             245|      Johnson |        Circle|      Suite 200|   Fairview|Williamson County|      TN| 35709|United States|         -5.0|             0.03|
|                3| AAAAAAAACAAAAAAA|       2001-01-01|           null|             null|        2450876| Mid Atlantic|   small|        3251|  837392| 8AM-4PM| William Johnson|        3|Classes devote la...|Ridiculous requir...|    Derrick Burke|          6|           cally|         3|            pri|             245|      Johnson |        Circle|      Suite 200|   Fairview|Williamson County|      TN| 35709|United States|         -5.0|             0.03|
|                4| AAAAAAAAEAAAAAAA|       1998-01-01|     2000-01-01|             null|        2450872|North Midwest|   large|        2596|  708708| 8AM-4PM|   Lamont Greene|        3|Events must find ...|Great rates must ...|      Marvin Dean|          2|            able|         2|           able|             927|      Oak Main|            ST|      Suite 150|Five Points|Williamson County|      TN| 36098|United States|         -5.0|             0.03|
|                5| AAAAAAAAEAAAAAAA|       2000-01-02|     2001-12-31|             null|        2450872|North Midwest|  medium|        2596|  708708|8AM-12AM|   Lamont Greene|        3|Events must find ...|So fresh supplies...| Matthew Williams|          2|            able|         1|           able|             927|      Oak Main|            ST|      Suite 150|Five Points|Williamson County|      TN| 36098|United States|         -5.0|              0.0|
|                6| AAAAAAAAEAAAAAAA|       2002-01-01|           null|             null|        2450872|North Midwest|   small|        2596|  708708| 8AM-4PM|   Emilio Romano|        6|As well novel sen...|Sophisticated cit...|  William Johnson|          5|            anti|         1|           able|             927|      Oak Main|            ST|      Suite 150|Five Points|Williamson County|      TN| 36098|United States|         -5.0|             0.07|
+-----------------+-----------------+-----------------+---------------+-----------------+---------------+-------------+--------+------------+--------+--------+----------------+---------+--------------------+--------------------+-----------------+-----------+----------------+----------+---------------+----------------+--------------+--------------+---------------+-----------+-----------------+--------+------+-------------+-------------+-----------------+

From: Somasundaram Sekar [mailto:somasundar.sekar@tigeranalytics.com]
Sent: Sunday, October 08, 2017 5:30 PM
To: user@spark.apache.org
Subject: Equivalent of Redshift ListAgg function in Spark (Pyspak)

Hi,

I want to concat multiple columns into a single column after grouping the  DataFrame,

I want an functional equivalent of Redshift ListAgg function

pg_catalog.Listagg(column, '|')
         within GROUP( ORDER BY column) AS
name

LISTAGG Function
: For each group in a query, the LISTAGG aggregate function orders the rows for that group according to the ORDER BY expression, then concatenates the values into a single string.
DISCLAIMER
==========
This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.