You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Justin Miller <ju...@protectwise.com> on 2017/04/05 17:03:41 UTC

Spark Streaming Kafka Job has strange behavior for certain tasks

Greetings!

I've been running various spark streaming jobs to persist data from kafka topics and one persister in particular seems to have issues. I've verified that the number of messages is the same per partition (roughly of course) and the volume of data is a fraction of the volume of other persisters that appear to be working fine. 

The tasks appear to go fine until approximately 74-80 of the tasks (of 96) in, and then the remaining tasks take a while. I'm using EMR/Spark 2.1.0/Kafka 0.10.0.1/EMRFS (EMR's S3 solution). Any help would be greatly appreciated!

Here's the code I'm using to do the transformation:

val transformedData = transformer(sqlContext.createDataFrame(values, converter.schema))

transformedData
  .write
  .mode(Append)
  .partitionBy(persisterConfig.partitioning: _*)
  .format("parquet")
  .save(parquetPath)

Here's the output of the job as it's running (thrift -> parquet/snappy -> s3 is the flow), the files are roughly the same size (96 files per 10 minute window):

17/04/05 16:43:43 INFO TaskSetManager: Finished task 72.0 in stage 7.0 (TID 722) in 10089 ms on ip-172-20-213-64.us-west-2.compute.internal (executor 57) (1/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 58.0 in stage 7.0 (TID 680) in 10099 ms on ip-172-20-218-229.us-west-2.compute.internal (executor 90) (2/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 81.0 in stage 7.0 (TID 687) in 10244 ms on ip-172-20-218-144.us-west-2.compute.internal (executor 8) (3/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 23.0 in stage 7.0 (TID 736) in 10236 ms on ip-172-20-209-248.us-west-2.compute.internal (executor 82) (4/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 52.0 in stage 7.0 (TID 730) in 10275 ms on ip-172-20-218-144.us-west-2.compute.internal (executor 78) (5/96)
17/04/05 16:43:43 INFO TaskSetManager: Finished task 45.0 in stage 7.0 (TID 691) in 10289 ms on ip-172-20-215-172.us-west-2.compute.internal (executor 41) (6/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 13.0 in stage 7.0 (TID 712) in 10532 ms on ip-172-20-223-100.us-west-2.compute.internal (executor 65) (7/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 42.0 in stage 7.0 (TID 694) in 10595 ms on ip-172-20-208-230.us-west-2.compute.internal (executor 18) (8/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 2.0 in stage 7.0 (TID 763) in 10623 ms on ip-172-20-208-230.us-west-2.compute.internal (executor 74) (9/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 82.0 in stage 7.0 (TID 727) in 10631 ms on ip-172-20-212-76.us-west-2.compute.internal (executor 72) (10/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 69.0 in stage 7.0 (TID 729) in 10716 ms on ip-172-20-215-172.us-west-2.compute.internal (executor 55) (11/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 65.0 in stage 7.0 (TID 673) in 10733 ms on ip-172-20-217-201.us-west-2.compute.internal (executor 67) (12/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 15.0 in stage 7.0 (TID 684) in 10737 ms on ip-172-20-213-64.us-west-2.compute.internal (executor 85) (13/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 27.0 in stage 7.0 (TID 748) in 10747 ms on ip-172-20-217-201.us-west-2.compute.internal (executor 10) (14/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 46.0 in stage 7.0 (TID 699) in 10834 ms on ip-172-20-218-229.us-west-2.compute.internal (executor 48) (15/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 6.0 in stage 7.0 (TID 719) in 10838 ms on ip-172-20-211-125.us-west-2.compute.internal (executor 52) (16/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 11.0 in stage 7.0 (TID 739) in 10892 ms on ip-172-20-215-172.us-west-2.compute.internal (executor 83) (17/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 88.0 in stage 7.0 (TID 697) in 10900 ms on ip-172-20-212-43.us-west-2.compute.internal (executor 70) (18/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 35.0 in stage 7.0 (TID 678) in 10909 ms on ip-172-20-212-63.us-west-2.compute.internal (executor 77) (19/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 0.0 in stage 7.0 (TID 700) in 10906 ms on ip-172-20-208-230.us-west-2.compute.internal (executor 46) (20/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 36.0 in stage 7.0 (TID 732) in 10935 ms on ip-172-20-215-172.us-west-2.compute.internal (executor 69) (21/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 19.0 in stage 7.0 (TID 759) in 10948 ms on ip-172-20-223-100.us-west-2.compute.internal (executor 37) (22/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 41.0 in stage 7.0 (TID 703) in 11013 ms on ip-172-20-217-201.us-west-2.compute.internal (executor 81) (23/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 8.0 in stage 7.0 (TID 745) in 11007 ms on ip-172-20-215-172.us-west-2.compute.internal (executor 13) (24/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 12.0 in stage 7.0 (TID 742) in 11014 ms on ip-172-20-212-43.us-west-2.compute.internal (executor 56) (25/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 55.0 in stage 7.0 (TID 734) in 11105 ms on ip-172-20-218-229.us-west-2.compute.internal (executor 6) (26/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 48.0 in stage 7.0 (TID 698) in 11139 ms on ip-172-20-218-229.us-west-2.compute.internal (executor 20) (27/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 64.0 in stage 7.0 (TID 685) in 11160 ms on ip-172-20-212-63.us-west-2.compute.internal (executor 63) (28/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 33.0 in stage 7.0 (TID 708) in 11168 ms on ip-172-20-218-144.us-west-2.compute.internal (executor 22) (29/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 53.0 in stage 7.0 (TID 749) in 11165 ms on ip-172-20-215-172.us-west-2.compute.internal (executor 27) (30/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 91.0 in stage 7.0 (TID 723) in 11179 ms on ip-172-20-220-110.us-west-2.compute.internal (executor 59) (31/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 34.0 in stage 7.0 (TID 743) in 11187 ms on ip-172-20-208-230.us-west-2.compute.internal (executor 32) (32/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 32.0 in stage 7.0 (TID 676) in 11201 ms on ip-172-20-211-125.us-west-2.compute.internal (executor 25) (33/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 59.0 in stage 7.0 (TID 755) in 11191 ms on ip-172-20-219-239.us-west-2.compute.internal (executor 33) (34/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 57.0 in stage 7.0 (TID 738) in 11206 ms on ip-172-20-213-64.us-west-2.compute.internal (executor 71) (35/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 17.0 in stage 7.0 (TID 728) in 11226 ms on ip-172-20-212-43.us-west-2.compute.internal (executor 28) (36/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 47.0 in stage 7.0 (TID 689) in 11233 ms on ip-172-20-223-100.us-west-2.compute.internal (executor 51) (37/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 70.0 in stage 7.0 (TID 737) in 11228 ms on ip-172-20-218-144.us-west-2.compute.internal (executor 92) (38/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 79.0 in stage 7.0 (TID 710) in 11238 ms on ip-172-20-208-230.us-west-2.compute.internal (executor 88) (39/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 80.0 in stage 7.0 (TID 679) in 11253 ms on ip-172-20-212-76.us-west-2.compute.internal (executor 16) (40/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 31.0 in stage 7.0 (TID 746) in 11298 ms on ip-172-20-223-100.us-west-2.compute.internal (executor 23) (41/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 89.0 in stage 7.0 (TID 718) in 11314 ms on ip-172-20-211-125.us-west-2.compute.internal (executor 66) (42/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 77.0 in stage 7.0 (TID 706) in 11329 ms on ip-172-20-211-125.us-west-2.compute.internal (executor 93) (43/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 95.0 in stage 7.0 (TID 767) in 11365 ms on ip-172-20-212-43.us-west-2.compute.internal (executor 42) (44/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 43.0 in stage 7.0 (TID 696) in 11382 ms on ip-172-20-211-125.us-west-2.compute.internal (executor 39) (45/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 71.0 in stage 7.0 (TID 713) in 11426 ms on ip-172-20-212-63.us-west-2.compute.internal (executor 21) (46/96)
17/04/05 16:43:44 INFO TaskSetManager: Finished task 20.0 in stage 7.0 (TID 721) in 11437 ms on ip-172-20-212-63.us-west-2.compute.internal (executor 7) (47/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 60.0 in stage 7.0 (TID 733) in 11534 ms on ip-172-20-213-64.us-west-2.compute.internal (executor 43) (48/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 21.0 in stage 7.0 (TID 741) in 11548 ms on ip-172-20-211-125.us-west-2.compute.internal (executor 11) (49/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 66.0 in stage 7.0 (TID 758) in 11657 ms on ip-172-20-212-63.us-west-2.compute.internal (executor 35) (50/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 40.0 in stage 7.0 (TID 765) in 11659 ms on ip-172-20-220-110.us-west-2.compute.internal (executor 73) (51/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 49.0 in stage 7.0 (TID 702) in 11711 ms on ip-172-20-209-248.us-west-2.compute.internal (executor 68) (52/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 22.0 in stage 7.0 (TID 754) in 11732 ms on ip-172-20-212-76.us-west-2.compute.internal (executor 2) (53/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 54.0 in stage 7.0 (TID 711) in 11784 ms on ip-172-20-212-43.us-west-2.compute.internal (executor 14) (54/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 78.0 in stage 7.0 (TID 675) in 11837 ms on ip-172-20-220-110.us-west-2.compute.internal (executor 87) (55/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 7.0 in stage 7.0 (TID 701) in 11842 ms on ip-172-20-220-110.us-west-2.compute.internal (executor 45) (56/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 14.0 in stage 7.0 (TID 747) in 11839 ms on ip-172-20-218-229.us-west-2.compute.internal (executor 34) (57/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 26.0 in stage 7.0 (TID 760) in 11888 ms on ip-172-20-209-248.us-west-2.compute.internal (executor 54) (58/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 9.0 in stage 7.0 (TID 693) in 11911 ms on ip-172-20-223-100.us-west-2.compute.internal (executor 94) (59/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 76.0 in stage 7.0 (TID 750) in 11961 ms on ip-172-20-212-63.us-west-2.compute.internal (executor 49) (60/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 30.0 in stage 7.0 (TID 764) in 12031 ms on ip-172-20-209-248.us-west-2.compute.internal (executor 40) (61/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 39.0 in stage 7.0 (TID 674) in 12084 ms on ip-172-20-209-248.us-west-2.compute.internal (executor 12) (62/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 29.0 in stage 7.0 (TID 740) in 12091 ms on ip-172-20-219-239.us-west-2.compute.internal (executor 47) (63/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 61.0 in stage 7.0 (TID 683) in 12163 ms on ip-172-20-218-229.us-west-2.compute.internal (executor 62) (64/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 50.0 in stage 7.0 (TID 705) in 12185 ms on ip-172-20-212-76.us-west-2.compute.internal (executor 44) (65/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 10.0 in stage 7.0 (TID 707) in 12266 ms on ip-172-20-219-239.us-west-2.compute.internal (executor 61) (66/96)
17/04/05 16:43:45 INFO TaskSetManager: Finished task 62.0 in stage 7.0 (TID 688) in 12374 ms on ip-172-20-219-239.us-west-2.compute.internal (executor 89) (67/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 5.0 in stage 7.0 (TID 752) in 12491 ms on ip-172-20-223-100.us-west-2.compute.internal (executor 9) (68/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 83.0 in stage 7.0 (TID 751) in 12649 ms on ip-172-20-209-248.us-west-2.compute.internal (executor 26) (69/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 67.0 in stage 7.0 (TID 682) in 12724 ms on ip-172-20-217-201.us-west-2.compute.internal (executor 38) (70/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 90.0 in stage 7.0 (TID 756) in 12825 ms on ip-172-20-212-76.us-west-2.compute.internal (executor 30) (71/96)
17/04/05 16:43:46 INFO TaskSetManager: Finished task 25.0 in stage 7.0 (TID 757) in 13302 ms on ip-172-20-212-76.us-west-2.compute.internal (executor 58) (72/96)
17/04/05 16:43:47 INFO TaskSetManager: Finished task 28.0 in stage 7.0 (TID 735) in 13667 ms on ip-172-20-220-110.us-west-2.compute.internal (executor 17) (73/96)
17/04/05 16:44:07 INFO TaskSetManager: Finished task 93.0 in stage 7.0 (TID 681) in 33805 ms on ip-172-20-220-110.us-west-2.compute.internal (executor 31) (74/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 87.0 in stage 7.0 (TID 744) in 310121 ms on ip-172-20-223-100.us-west-2.compute.internal (executor 80) (75/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 3.0 in stage 7.0 (TID 709) in 310221 ms on ip-172-20-212-63.us-west-2.compute.internal (executor 91) (76/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 85.0 in stage 7.0 (TID 726) in 310370 ms on ip-172-20-209-248.us-west-2.compute.internal (executor 96) (77/96)
17/04/05 16:48:43 INFO TaskSetManager: Finished task 38.0 in stage 7.0 (TID 725) in 310391 ms on ip-172-20-219-239.us-west-2.compute.internal (executor 75) (78/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 37.0 in stage 7.0 (TID 766) in 310617 ms on ip-172-20-219-239.us-west-2.compute.internal (executor 19) (79/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 16.0 in stage 7.0 (TID 720) in 310678 ms on ip-172-20-218-144.us-west-2.compute.internal (executor 64) (80/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 68.0 in stage 7.0 (TID 753) in 310779 ms on ip-172-20-218-144.us-west-2.compute.internal (executor 50) (81/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 24.0 in stage 7.0 (TID 695) in 310802 ms on ip-172-20-212-76.us-west-2.compute.internal (executor 86) (82/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 86.0 in stage 7.0 (TID 714) in 310808 ms on ip-172-20-218-144.us-west-2.compute.internal (executor 36) (83/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 51.0 in stage 7.0 (TID 716) in 310837 ms on ip-172-20-217-201.us-west-2.compute.internal (executor 24) (84/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 92.0 in stage 7.0 (TID 761) in 310858 ms on ip-172-20-213-64.us-west-2.compute.internal (executor 1) (85/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 75.0 in stage 7.0 (TID 672) in 310995 ms on ip-172-20-213-64.us-west-2.compute.internal (executor 29) (86/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 1.0 in stage 7.0 (TID 715) in 311159 ms on ip-172-20-212-43.us-west-2.compute.internal (executor 84) (87/96)
17/04/05 16:48:44 INFO TaskSetManager: Finished task 4.0 in stage 7.0 (TID 677) in 311443 ms on ip-172-20-220-110.us-west-2.compute.internal (executor 3) (88/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 73.0 in stage 7.0 (TID 690) in 311523 ms on ip-172-20-218-229.us-west-2.compute.internal (executor 76) (89/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 84.0 in stage 7.0 (TID 686) in 311554 ms on ip-172-20-208-230.us-west-2.compute.internal (executor 60) (90/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 44.0 in stage 7.0 (TID 692) in 312165 ms on ip-172-20-208-230.us-west-2.compute.internal (executor 4) (91/96)
17/04/05 16:48:45 INFO TaskSetManager: Finished task 63.0 in stage 7.0 (TID 762) in 312299 ms on ip-172-20-211-125.us-west-2.compute.internal (executor 79) (92/96)
17/04/05 16:48:46 INFO TaskSetManager: Finished task 94.0 in stage 7.0 (TID 724) in 313148 ms on ip-172-20-219-239.us-west-2.compute.internal (executor 5) (93/96)
17/04/05 16:48:46 INFO TaskSetManager: Finished task 18.0 in stage 7.0 (TID 717) in 313332 ms on ip-172-20-213-64.us-west-2.compute.internal (executor 15) (94/96)
17/04/05 16:48:48 INFO TaskSetManager: Finished task 56.0 in stage 7.0 (TID 731) in 314838 ms on ip-172-20-217-201.us-west-2.compute.internal (executor 95) (95/96)
17/04/05 16:48:52 INFO TaskSetManager: Finished task 74.0 in stage 7.0 (TID 704) in 318573 ms on ip-172-20-217-201.us-west-2.compute.internal (executor 53) (96/96)

Thanks,
Justin


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org