You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by pseudo oduesp <ps...@gmail.com> on 2016/07/26 08:39:25 UTC

PCA machine learning

Hi,
when i perform PCA reduction dimension i get dense vector with length of
number of principla component  my question :

 -How i get the name of features giving this vectors ?
 -the  values inside vectors result its  value of projection of all
features  on this componenets ?
- how to use it ?

thanks

Re: PCA machine learning

Posted by pseudo oduesp <ps...@gmail.com>.
hi ,
i want add somme point
getting the follow tow vectors first on it s  features vectors =
Row(features=SparseVector(765, {0: 3.0, 1: 1.0, 2: 50.0, 3: 16.0, 5:
88021.0, 6: 88021.0, 8: 1.0, 11: 1.0, 12: 200.0, 14: 200.0, 15: 200.0, 16:
200.0, 17: 2.0, 18: 1.0, 25: 1.0, 26: 2.0, 31: 89200.0, 32: 65.0, 33: 1.0,
34: 89020044.0, 35: 1.0, 36: 1.0, 42: 4.0, 43: 24.0, 44: 2274.0, 45: 54.0,
46: 34.0, 47: 44.0, 48: 2654.0, 49: 2934.0, 50: 84.0, 56: 3404.0, 57: 16.0,
59: 1.0, 70: 1.0, 75: 1.0, 76: 1.0, 77: 1.0, 78: 1.0, 79: 1.0, 80: 1.0, 81:
1.0, 82: 1.0, 83: 1.0, 84: 1.0, 85: 1.0, 86: 1.0, 87: 1.0, 88: 1.0, 89:
1.0, 91: 1.0, 92: 1.0, 93: 1.0, 94: 1.0, 95: 1.0, 96: 1.0, 97: 1.0, 98:
1.0, 99: 1.0, 100: 1.0, 102: 1.0, 137: 1.0, 139: 1.0, 141: 1.0, 150: 1.0,
155: 1.0, 158: 1.0, 160: 1.0, 259: 0.61, 260: 0.61, 261: 0.61, 262: 0.61,
263: 1.0, 264: 0.61, 265: 0.61, 266: 0.61, 267: 0.61, 268: 1.0, 269: 0.61,
270: 0.61, 271: 0.61, 272: 0.61, 273: 1.0, 274: 0.61, 275: 0.61, 276: 0.61,
277: 0.61, 278: 1.0, 281: 916.57, 282: 916.57, 283: 916.57, 284: 865.43,
285: 865.43, 286: 865.43, 287: 816.19, 288: 816.19, 289: 816.19, 290:
760.53, 291: 760.53, 292: 760.53, 293: 874.9, 294: 874.9, 295: 874.9, 296:
963.89, 297: 172.9, 298: 73.64, 299: 1.87, 300: 349.53, 301: 109.95, 302:
116.67, 303: 38.59, 304: 68.28, 305: 2.23, 313: 1.0, 314: 1.0, 315: 1.0,
316: 1.0, 317: 1.0, 318: 1.0, 319: 1.0, 320: 1.0, 321: 1.0, 322: 1.0, 323:
109.95, 324: 172.9, 325: 116.67, 326: 38.59, 327: 2.23, 328: 73.64, 329:
1.87, 330: 349.53, 331: 68.28, 332: 180.46, 333: 933.66, 334: 916.57, 335:
1.0, 336: 1.0, 337: 1.0, 338: 1.0, 339: 1.0, 340: 1.0, 341: 1.0, 342: 1.0,
343: 1.0, 344: 166.231, 345: 323.713, 346: 104.988, 347: 104.988, 348:
34.996, 350: 69.992, 352: 61.243, 353: 166.231, 354: 323.713, 355: 104.988,
356: 104.988, 357: 34.996, 359: 69.992, 361: 61.243, 364: 1.0, 365: 1.0,
366: 1.0, 367: 1.0, 368: 1.0, 369: 1.0, 370: 1.0, 371: 1.0, 372: 1.0, 373:
144.5007, 374: 281.3961, 375: 91.2636, 376: 91.2636, 377: 30.4212, 379:
60.8424, 381: 53.2371, 382: 144.5007, 383: 281.3961, 384: 91.2636, 385:
91.2636, 386: 30.4212, 388: 60.8424, 390: 53.2371, 393: 1.0, 394: 1.0, 395:
1.0, 396: 1.0, 397: 1.0, 398: 1.0, 399: 1.0, 400: 1.0, 401: 1.0, 402:
155.0761, 403: 301.9903, 404: 97.9428, 405: 97.9428, 406: 32.6476, 408:
65.2952, 410: 57.1333, 411: 155.0761, 412: 301.9903, 413: 97.9428, 414:
97.9428, 415: 32.6476, 417: 65.2952, 419: 57.1333, 422: 1.0, 423: 1.0, 424:
1.0, 425: 1.0, 426: 1.0, 427: 1.0, 428: 1.0, 429: 1.0, 430: 1.0, 431:
164.4317, 432: 320.2091, 433: 103.8516, 434: 103.8516, 435: 34.6172, 437:
69.2344, 439: 60.5801, 440: 164.4317, 441: 320.2091, 442: 103.8516, 443:
103.8516, 444: 34.6172, 446: 69.2344, 448: 60.5801, 451: 1.0, 452: 1.0,
453: 1.0, 454: 1.0, 455: 1.0, 456: 1.0, 457: 1.0, 458: 1.0, 459: 1.0, 460:
174.1483, 461: 339.1309, 462: 109.9884, 463: 109.9884, 464: 36.6628, 466:
73.3256, 468: 64.1599, 469: 174.1483, 470: 339.1309, 471: 109.9884, 472:
109.9884, 473: 36.6628, 475: 73.3256, 477: 64.1599, 480: 0.0001, 481:
0.0001, 482: 0.0001, 483: 0.0001, 484: 0.0001, 485: 0.0001, 486: 0.0001,
487: 0.0001, 488: 172.9, 489: 172.9, 490: 172.9, 491: 172.9, 492: 283.4426,
493: 283.4426, 494: 283.4426, 495: 283.4426, 504: 73.64, 505: 73.64, 506:
73.64, 507: 73.64, 508: 1207213.1148, 509: 1207213.1148, 510: 1207213.1148,
511: 1207213.1148, 520: 1.87, 521: 1.87, 522: 1.87, 523: 1.87, 524:
30655.7377, 525: 30655.7377, 526: 30655.7377, 527: 30655.7377, 536: 349.53,
537: 349.53, 538: 349.53, 539: 349.53, 540: 573.0, 541: 573.0, 542: 573.0,
543: 573.0, 552: 116.67, 553: 116.67, 554: 116.67, 555: 116.67, 556:
191.2623, 557: 191.2623, 558: 191.2623, 559: 191.2623, 568: 38.59, 569:
38.59, 570: 38.59, 571: 38.59, 572: 38.59, 573: 38.59, 574: 38.59, 575:
38.59, 584: 180.46, 585: 180.46, 586: 180.46, 587: 180.46, 588: 295.8361,
589: 295.8361, 590: 295.8361, 591: 295.8361, 600: 933.66, 601: 933.66, 602:
933.66, 603: 933.66, 604: 1239250.9834, 605: 1239250.9834, 606:
1239250.9834, 607: 1239250.9834, 643: 170.0, 644: 170.0, 646: 170.0, 648:
170.0, 658: 170.0, 662: 170.0, 665: 170.0, 667: 170.0, 758: 0.224, 763:
0.224}),


and second one it's projection on 20 principal component anlaysis  :


pca_features=DenseVector([89036409.0534, 2986242.0691, 227234.8184,
108796.4282, -129553.463, 89983.1029, 223420.7277, 53740.2034,
-113602.7292, -20057.1001, 33872.3162, -759.2689, 410.2222, -872.6325,
-4896.6554, 4060.5014, -786.3297, -951.3851, 68464.2515, 3850.9394,
876.7108, 98.5793, 21342.2015, 863.9765, 1456.3933, -265.2494, 85325.4192,
-3657.0752, 111.7979, -59.6176, -945.8667, -84.1924, 246.233, -636.8786,
-749.1798, 900.8763, -177.4543, -105.4379, 272.7857, -535.0951]))]


when i create  the vector from orginal data frame i had order of my columns
like that i can associete for each  value in feature the name of variable .

how i can  identify names of principal component in second vector ?






2016-07-26 10:39 GMT+02:00 pseudo oduesp <ps...@gmail.com>:

> Hi,
> when i perform PCA reduction dimension i get dense vector with length of
> number of principla component  my question :
>
>  -How i get the name of features giving this vectors ?
>  -the  values inside vectors result its  value of projection of all
> features  on this componenets ?
> - how to use it ?
>
> thanks
>
>