You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Rex X <dn...@gmail.com> on 2016/08/26 11:54:36 UTC
How to make new composite columns by combining rows in the same group?
1. Given following CSV file
$cat data.csv
ID,City,Zip,Price,Rating1,A,95123,100,01,B,95124,102,11,A,95126,100,12,B,95123,200,02,B,95124,201,12,C,95124,203,03,A,95126,300,13,C,95124,280,04,C,95124,400,1
We want to group by ID, and make new composite columns of Price and Rating
based on the value of $City-$Zip.
2. The Expected Result:
ID
A_95123_Price
A_95123_Rating
A_95126_Price
A_95126_Rating
B_95123_Price
B_95123_Rating
B_95124_Price
B_95124_Rating
C_95124_Price
C_95124_Rating
1
100
1
100
2
0
0
102
2
0
0
2
0
0
0
0
200
1
201
2
203
1
3
0
0
300
2
0
0
0
0
280
1
4
0
0
0
0
0
0
0
0
400
2
Any tips would be greatly appreciated!
Thank you.
Regards,
Rex
Re: How to make new composite columns by combining rows in the same group?
Posted by Rex X <dn...@gmail.com>.
The data.csv need to be corrected:
1. Given following CSV file
$cat data.csv
ID,City,Zip,Price,Rating
1,A,95123,100,1
1,B,95124,102,2
1,A,95126,100,2
2,B,95123,200,1
2,B,95124,201,2
2,C,95124,203,1
3,A,95126,300,2
3,C,95124,280,1
4,C,95124,400,2
On Fri, Aug 26, 2016 at 4:54 AM, Rex X <dn...@gmail.com> wrote:
> 1. Given following CSV file
>
> $cat data.csv
>
> ID,City,Zip,Price,Rating1,A,95123,100,01,B,95124,102,11,A,95126,100,12,B,95123,200,02,B,95124,201,12,C,95124,203,03,A,95126,300,13,C,95124,280,04,C,95124,400,1
>
>
> We want to group by ID, and make new composite columns of Price and Rating
> based on the value of $City-$Zip.
>
>
> 2. The Expected Result:
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> ID
> A_95123_Price
> A_95123_Rating
> A_95126_Price
> A_95126_Rating
> B_95123_Price
> B_95123_Rating
> B_95124_Price
> B_95124_Rating
> C_95124_Price
> C_95124_Rating
>
>
> 1
> 100
> 1
> 100
> 2
> 0
> 0
> 102
> 2
> 0
> 0
>
>
> 2
> 0
> 0
> 0
> 0
> 200
> 1
> 201
> 2
> 203
> 1
>
>
> 3
> 0
> 0
> 300
> 2
> 0
> 0
> 0
> 0
> 280
> 1
>
>
> 4
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 400
> 2
>
> Any tips would be greatly appreciated!
>
> Thank you.
>
> Regards,
> Rex
>
>
Re: How to make new composite columns by combining rows in the same group?
Posted by Xinh Huynh <xi...@gmail.com>.
That looks like a pivot table. Have you looked into using the pivot table method with DataFrames?
Xinh
> On Aug 26, 2016, at 4:54 AM, Rex X <dn...@gmail.com> wrote:
>
> 1. Given following CSV file
> $cat data.csv
>
> ID,City,Zip,Price,Rating
> 1,A,95123,100,0
> 1,B,95124,102,1
> 1,A,95126,100,1
> 2,B,95123,200,0
> 2,B,95124,201,1
> 2,C,95124,203,0
> 3,A,95126,300,1
> 3,C,95124,280,0
> 4,C,95124,400,1
>
> We want to group by ID, and make new composite columns of Price and Rating based on the value of $City-$Zip.
>
>
> 2. The Expected Result:
>
> ID A_95123_Price A_95123_Rating A_95126_Price A_95126_Rating B_95123_Price B_95123_Rating B_95124_Price B_95124_Rating C_95124_Price C_95124_Rating
> 1 100 1 100 2 0 0 102 2 0 0
> 2 0 0 0 0 200 1 201 2 203 1
> 3 0 0 300 2 0 0 0 0 280 1
> 4 0 0 0 0 0 0 0 0 400 2
>
> Any tips would be greatly appreciated!
>
> Thank you.
>
> Regards,
> Rex
>