You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rex X <dn...@gmail.com> on 2016/08/26 11:54:36 UTC

How to make new composite columns by combining rows in the same group?

1. Given following CSV file

$cat data.csv

ID,City,Zip,Price,Rating1,A,95123,100,01,B,95124,102,11,A,95126,100,12,B,95123,200,02,B,95124,201,12,C,95124,203,03,A,95126,300,13,C,95124,280,04,C,95124,400,1


We want to group by ID, and make new composite columns of Price and Rating
based on the value of $City-$Zip.


2. The Expected Result:















  ID
  A_95123_Price
   A_95123_Rating
  A_95126_Price
  A_95126_Rating
  B_95123_Price
  B_95123_Rating
  B_95124_Price
  B_95124_Rating
  C_95124_Price
  C_95124_Rating


  1
  100
  1
  100
  2
  0
  0
  102
  2
  0
  0


  2
  0
  0
  0
  0
  200
  1
  201
  2
  203
  1


  3
  0
  0
  300
  2
  0
  0
  0
  0
  280
  1


  4
  0
  0
  0
  0
  0
  0
  0
  0
  400
  2

Any tips would be greatly appreciated!

Thank you.

Regards,
Rex

Re: How to make new composite columns by combining rows in the same group?

Posted by Rex X <dn...@gmail.com>.
The data.csv need to be corrected:


1. Given following CSV file
$cat data.csv

ID,City,Zip,Price,Rating
1,A,95123,100,1
1,B,95124,102,2
1,A,95126,100,2
2,B,95123,200,1
2,B,95124,201,2
2,C,95124,203,1
3,A,95126,300,2
3,C,95124,280,1
4,C,95124,400,2


On Fri, Aug 26, 2016 at 4:54 AM, Rex X <dn...@gmail.com> wrote:

> 1. Given following CSV file
>
> $cat data.csv
>
> ID,City,Zip,Price,Rating1,A,95123,100,01,B,95124,102,11,A,95126,100,12,B,95123,200,02,B,95124,201,12,C,95124,203,03,A,95126,300,13,C,95124,280,04,C,95124,400,1
>
>
> We want to group by ID, and make new composite columns of Price and Rating
> based on the value of $City-$Zip.
>
>
> 2. The Expected Result:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>   ID
>   A_95123_Price
>    A_95123_Rating
>   A_95126_Price
>   A_95126_Rating
>   B_95123_Price
>   B_95123_Rating
>   B_95124_Price
>   B_95124_Rating
>   C_95124_Price
>   C_95124_Rating
>
>
>   1
>   100
>   1
>   100
>   2
>   0
>   0
>   102
>   2
>   0
>   0
>
>
>   2
>   0
>   0
>   0
>   0
>   200
>   1
>   201
>   2
>   203
>   1
>
>
>   3
>   0
>   0
>   300
>   2
>   0
>   0
>   0
>   0
>   280
>   1
>
>
>   4
>   0
>   0
>   0
>   0
>   0
>   0
>   0
>   0
>   400
>   2
>
> Any tips would be greatly appreciated!
>
> Thank you.
>
> Regards,
> Rex
>
>

Re: How to make new composite columns by combining rows in the same group?

Posted by Xinh Huynh <xi...@gmail.com>.
That looks like a pivot table. Have you looked into using the pivot table method with DataFrames?

Xinh

> On Aug 26, 2016, at 4:54 AM, Rex X <dn...@gmail.com> wrote:
> 
> 1. Given following CSV file
> $cat data.csv
> 
> ID,City,Zip,Price,Rating
> 1,A,95123,100,0
> 1,B,95124,102,1
> 1,A,95126,100,1
> 2,B,95123,200,0
> 2,B,95124,201,1
> 2,C,95124,203,0
> 3,A,95126,300,1
> 3,C,95124,280,0
> 4,C,95124,400,1
> 
> We want to group by ID, and make new composite columns of Price and Rating based on the value of $City-$Zip. 
> 
> 
> 2. The Expected Result:
> 
> ID	A_95123_Price	 A_95123_Rating	A_95126_Price	A_95126_Rating	B_95123_Price	B_95123_Rating	B_95124_Price	B_95124_Rating	C_95124_Price	C_95124_Rating
> 1	100	1	100	2	0	0	102	2	0	0
> 2	0	0	0	0	200	1	201	2	203	1
> 3	0	0	300	2	0	0	0	0	280	1
> 4	0	0	0	0	0	0	0	0	400	2
> 
> Any tips would be greatly appreciated!
> 
> Thank you.
> 
> Regards,
> Rex
>