You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "jackielihf (JIRA)" <ji...@apache.org> on 2014/06/25 09:14:24 UTC

[jira] [Updated] (SPARK-2274) spark SQL query hang up sometime

     [ https://issues.apache.org/jira/browse/SPARK-2274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jackielihf updated SPARK-2274:
------------------------------

    Description: 
when I run spark SQL query, it hang up sometimes:
1) simple SQL query works, such as "select * from a left out join b on a.id=b.id"
2) BUT if it has more joins, such as "select * from a left out join b on a.id=b.id left out join c on a.id=c.id...", spark shell seem to hang up.

spark shell prints:
scala> hc.hql("select A.id,A.tit,A.sub_tit,B.abst,B.cont,A.aut,A.com_name,A.med_name,A.pub_dt,A.upd_time,A.ent_time,A.info_lvl,A.is_pic,A.lnk_addr,A.is_ann,A.info_open_lvl,A.keyw_name,C.typ_code as type,D.evt_dir,D.evt_st,E.secu_id,E.typ_codeii,E.exch_code,E.trd_code,E.secu_sht from txt_nws_bas_update A left outer join txt_nws_bas_txt B on A.id=B.orig_id left outer join txt_nws_typ C on A.id=C.orig_id left outer join txt_nws_secu D on A.id=D.orig_id left outer join bas_secu_info E on D.secu_id=E.secu_id where D.secu_id is not null limit 5").foreach(println)
14/06/25 13:32:25 INFO ParseDriver: Parsing command: select A.id,A.tit,A.sub_tit,B.abst,B.cont,A.aut,A.com_name,A.med_name,A.pub_dt,A.upd_time,A.ent_time,A.info_lvl,A.is_pic,A.lnk_addr,A.is_ann,A.info_open_lvl,A.keyw_name,C.typ_code as type,D.evt_dir,D.evt_st,E.secu_id,E.typ_codeii,E.exch_code,E.trd_code,E.secu_sht from txt_nws_bas_update A left outer join txt_nws_bas_txt B on A.id=B.orig_id left outer join txt_nws_typ C on A.id=C.orig_id left outer join txt_nws_secu D on A.id=D.orig_id left outer join bas_secu_info E on D.secu_id=E.secu_id where D.secu_id is not null limit 5
14/06/25 13:32:25 INFO ParseDriver: Parse Completed
14/06/25 13:32:25 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations
14/06/25 13:32:25 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
14/06/25 13:32:27 INFO MemoryStore: ensureFreeSpace(220923) called with curMem=0, maxMem=311387750
14/06/25 13:32:27 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 215.7 KB, free 296.8 MB)
14/06/25 13:32:27 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=220923, maxMem=311387750
14/06/25 13:32:27 INFO MemoryStore: Block broadcast_1 stored as values to memory (estimated size 215.8 KB, free 296.5 MB)
14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=441894, maxMem=311387750
14/06/25 13:32:28 INFO MemoryStore: Block broadcast_2 stored as values to memory (estimated size 215.8 KB, free 296.3 MB)
14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=662865, maxMem=311387750
14/06/25 13:32:28 INFO MemoryStore: Block broadcast_3 stored as values to memory (estimated size 215.8 KB, free 296.1 MB)
14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=883836, maxMem=311387750
14/06/25 13:32:28 INFO MemoryStore: Block broadcast_4 stored as values to memory (estimated size 215.8 KB, free 295.9 MB)
14/06/25 13:32:29 INFO SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange
14/06/25 13:32:29 INFO SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions
14/06/25 13:32:30 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:32:30 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:32:30 INFO DAGScheduler: Got job 0 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:32:30 INFO DAGScheduler: Final stage: Stage 0(collect at joins.scala:184)
14/06/25 13:32:30 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:32:30 INFO DAGScheduler: Missing parents: List()
14/06/25 13:32:30 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at joins.scala:184), which has no missing parents
14/06/25 13:32:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[7] at map at joins.scala:184)
14/06/25 13:32:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/06/25 13:32:30 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:32:30 INFO TaskSetManager: Serialized task 0.0:0 as 4088 bytes in 3 ms
14/06/25 13:32:30 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:32:30 INFO TaskSetManager: Serialized task 0.0:1 as 4088 bytes in 2 ms
14/06/25 13:32:44 INFO BlockManagerInfo: Added taskresult_1 in memory on 192.168.56.100:47102 (size: 15.0 MB, free: 281.9 MB)
14/06/25 13:32:44 INFO SendingConnection: Initiating connection to [/192.168.56.100:47102]
14/06/25 13:32:44 INFO SendingConnection: Connected to [/192.168.56.100:47102], 1 messages pending
14/06/25 13:32:44 INFO BlockManagerInfo: Added taskresult_0 in memory on 192.168.56.100:41322 (size: 15.1 MB, free: 281.9 MB)
14/06/25 13:32:44 INFO SendingConnection: Initiating connection to [/192.168.56.100:41322]
14/06/25 13:32:44 INFO SendingConnection: Connected to [/192.168.56.100:41322], 1 messages pending
14/06/25 13:32:49 INFO ConnectionManager: Accepted connection from [192.168.56.100/192.168.56.100]
14/06/25 13:32:55 INFO BlockManagerInfo: Removed taskresult_1 on 192.168.56.100:47102 in memory (size: 15.0 MB, free: 297.0 MB)
14/06/25 13:32:55 INFO DAGScheduler: Completed ResultTask(0, 1)
14/06/25 13:32:55 INFO TaskSetManager: Finished TID 1 in 24749 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:10 INFO ConnectionManager: Accepted connection from [192.168.56.100/192.168.56.100]
14/06/25 13:33:15 INFO BlockManagerInfo: Removed taskresult_0 on 192.168.56.100:41322 in memory (size: 15.1 MB, free: 297.0 MB)
14/06/25 13:33:15 INFO DAGScheduler: Completed ResultTask(0, 0)
14/06/25 13:33:15 INFO TaskSetManager: Finished TID 0 in 45025 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:15 INFO DAGScheduler: Stage 0 (collect at joins.scala:184) finished in 45.665 s
14/06/25 13:33:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/06/25 13:33:16 INFO SparkContext: Job finished: collect at joins.scala:184, took 45.910877269 s
14/06/25 13:33:20 INFO MemoryStore: ensureFreeSpace(133210496) called with curMem=1104807, maxMem=311387750
14/06/25 13:33:20 INFO MemoryStore: Block broadcast_5 stored as values to memory (estimated size 127.0 MB, free 168.9 MB)
14/06/25 13:33:22 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:22 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:33:22 INFO DAGScheduler: Got job 1 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:33:22 INFO DAGScheduler: Final stage: Stage 1(collect at joins.scala:184)
14/06/25 13:33:22 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:22 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:22 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[12] at map at joins.scala:184), which has no missing parents
14/06/25 13:33:22 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[12] at map at joins.scala:184)
14/06/25 13:33:22 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/06/25 13:33:22 INFO TaskSetManager: Starting task 1.0:0 as TID 2 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:22 INFO TaskSetManager: Serialized task 1.0:0 as 3705 bytes in 0 ms
14/06/25 13:33:22 INFO TaskSetManager: Starting task 1.0:1 as TID 3 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:22 INFO TaskSetManager: Serialized task 1.0:1 as 3705 bytes in 0 ms
14/06/25 13:33:26 INFO DAGScheduler: Completed ResultTask(1, 1)
14/06/25 13:33:26 INFO TaskSetManager: Finished TID 3 in 4178 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:26 INFO DAGScheduler: Completed ResultTask(1, 0)
14/06/25 13:33:26 INFO DAGScheduler: Stage 1 (collect at joins.scala:184) finished in 4.365 s
14/06/25 13:33:26 INFO SparkContext: Job finished: collect at joins.scala:184, took 4.437216848 s
14/06/25 13:33:26 INFO TaskSetManager: Finished TID 2 in 4278 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:26 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
14/06/25 13:33:27 INFO MemoryStore: ensureFreeSpace(31747896) called with curMem=134315303, maxMem=311387750
14/06/25 13:33:27 INFO MemoryStore: Block broadcast_6 stored as values to memory (estimated size 30.3 MB, free 138.6 MB)
14/06/25 13:33:28 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:28 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:33:28 INFO DAGScheduler: Got job 2 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:33:28 INFO DAGScheduler: Final stage: Stage 2(collect at joins.scala:184)
14/06/25 13:33:28 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:28 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:28 INFO DAGScheduler: Submitting Stage 2 (MappedRDD[17] at map at joins.scala:184), which has no missing parents
14/06/25 13:33:28 INFO DAGScheduler: Submitting 2 missing tasks from Stage 2 (MappedRDD[17] at map at joins.scala:184)
14/06/25 13:33:28 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
14/06/25 13:33:28 INFO TaskSetManager: Starting task 2.0:0 as TID 4 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:28 INFO TaskSetManager: Serialized task 2.0:0 as 3627 bytes in 0 ms
14/06/25 13:33:28 INFO TaskSetManager: Starting task 2.0:1 as TID 5 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:28 INFO TaskSetManager: Serialized task 2.0:1 as 3627 bytes in 0 ms
14/06/25 13:33:30 INFO DAGScheduler: Completed ResultTask(2, 0)
14/06/25 13:33:30 INFO TaskSetManager: Finished TID 4 in 2285 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:30 INFO DAGScheduler: Completed ResultTask(2, 1)
14/06/25 13:33:30 INFO DAGScheduler: Stage 2 (collect at joins.scala:184) finished in 2.681 s
14/06/25 13:33:30 INFO SparkContext: Job finished: collect at joins.scala:184, took 2.743363946 s
14/06/25 13:33:30 INFO TaskSetManager: Finished TID 5 in 2610 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:30 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
14/06/25 13:33:33 INFO MemoryStore: ensureFreeSpace(49064048) called with curMem=166063199, maxMem=311387750
14/06/25 13:33:33 INFO MemoryStore: Block broadcast_7 stored as values to memory (estimated size 46.8 MB, free 91.8 MB)
14/06/25 13:33:34 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:34 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:33:34 INFO DAGScheduler: Got job 3 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:33:34 INFO DAGScheduler: Final stage: Stage 3(collect at joins.scala:184)
14/06/25 13:33:34 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:34 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:34 INFO DAGScheduler: Submitting Stage 3 (MappedRDD[22] at map at joins.scala:184), which has no missing parents
14/06/25 13:33:34 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3 (MappedRDD[22] at map at joins.scala:184)
14/06/25 13:33:34 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
14/06/25 13:33:34 INFO TaskSetManager: Starting task 3.0:0 as TID 6 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:34 INFO TaskSetManager: Serialized task 3.0:0 as 3708 bytes in 1 ms
14/06/25 13:33:34 INFO TaskSetManager: Starting task 3.0:1 as TID 7 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:34 INFO TaskSetManager: Serialized task 3.0:1 as 3708 bytes in 1 ms
14/06/25 13:33:35 INFO DAGScheduler: Completed ResultTask(3, 1)
14/06/25 13:33:35 INFO TaskSetManager: Finished TID 7 in 608 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:35 INFO DAGScheduler: Completed ResultTask(3, 0)
14/06/25 13:33:35 INFO DAGScheduler: Stage 3 (collect at joins.scala:184) finished in 0.636 s
14/06/25 13:33:35 INFO SparkContext: Job finished: collect at joins.scala:184, took 0.647360428 s
14/06/25 13:33:35 INFO TaskSetManager: Finished TID 6 in 627 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:35 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
14/06/25 13:33:35 INFO MemoryStore: ensureFreeSpace(4471280) called with curMem=215127247, maxMem=311387750
14/06/25 13:33:35 INFO MemoryStore: Block broadcast_8 stored as values to memory (estimated size 4.3 MB, free 87.5 MB)
14/06/25 13:33:35 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:35 INFO SparkContext: Starting job: count at joins.scala:216
14/06/25 13:33:35 INFO DAGScheduler: Got job 4 (count at joins.scala:216) with 2 output partitions (allowLocal=false)
14/06/25 13:33:35 INFO DAGScheduler: Final stage: Stage 4(count at joins.scala:216)
14/06/25 13:33:35 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:35 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:35 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[28] at map at joins.scala:214), which has no missing parents
14/06/25 13:33:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[28] at map at joins.scala:214)
14/06/25 13:33:36 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
14/06/25 13:33:36 INFO TaskSetManager: Starting task 4.0:0 as TID 8 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:36 INFO TaskSetManager: Serialized task 4.0:0 as 4570 bytes in 0 ms
14/06/25 13:33:36 INFO TaskSetManager: Starting task 4.0:1 as TID 9 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:36 INFO TaskSetManager: Serialized task 4.0:1 as 4570 bytes in 0 ms
14/06/25 13:33:47 INFO TaskSetManager: Finished TID 9 in 10869 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:47 INFO DAGScheduler: Completed ResultTask(4, 1)
14/06/25 13:33:47 INFO TaskSetManager: Finished TID 8 in 11240 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:47 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
14/06/25 13:33:47 INFO DAGScheduler: Completed ResultTask(4, 0)
14/06/25 13:33:47 INFO DAGScheduler: Stage 4 (count at joins.scala:216) finished in 11.191 s
14/06/25 13:33:47 INFO SparkContext: Job finished: count at joins.scala:216, took 12.351801444 s
14/06/25 13:33:47 INFO SparkContext: Starting job: reduce at joins.scala:219
14/06/25 13:33:47 INFO DAGScheduler: Got job 5 (reduce at joins.scala:219) with 2 output partitions (allowLocal=false)
14/06/25 13:33:47 INFO DAGScheduler: Final stage: Stage 5(reduce at joins.scala:219)
14/06/25 13:33:47 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:47 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:47 INFO DAGScheduler: Submitting Stage 5 (MappedRDD[29] at map at joins.scala:219), which has no missing parents
14/06/25 13:33:47 INFO DAGScheduler: Submitting 2 missing tasks from Stage 5 (MappedRDD[29] at map at joins.scala:219)
14/06/25 13:33:47 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
14/06/25 13:33:47 INFO TaskSetManager: Starting task 5.0:0 as TID 10 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:47 INFO TaskSetManager: Serialized task 5.0:0 as 4586 bytes in 0 ms
14/06/25 13:33:47 INFO TaskSetManager: Starting task 5.0:1 as TID 11 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:47 INFO TaskSetManager: Serialized task 5.0:1 as 4586 bytes in 0 ms
14/06/25 13:33:52 INFO TaskSetManager: Finished TID 10 in 4929 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:52 INFO DAGScheduler: Completed ResultTask(5, 0)
14/06/25 13:33:52 INFO TaskSetManager: Finished TID 11 in 5285 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:52 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
14/06/25 13:33:52 INFO DAGScheduler: Completed ResultTask(5, 1)
14/06/25 13:33:52 INFO DAGScheduler: Stage 5 (reduce at joins.scala:219) finished in 5.236 s
14/06/25 13:33:52 INFO SparkContext: Job finished: reduce at joins.scala:219, took 5.306035846 s
14/06/25 13:33:53 INFO SparkContext: Starting job: count at joins.scala:216
14/06/25 13:33:53 INFO DAGScheduler: Got job 6 (count at joins.scala:216) with 4 output partitions (allowLocal=false)
14/06/25 13:33:53 INFO DAGScheduler: Final stage: Stage 6(count at joins.scala:216)
14/06/25 13:33:53 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:53 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:53 INFO DAGScheduler: Submitting Stage 6 (MappedRDD[34] at map at joins.scala:214), which has no missing parents
14/06/25 13:33:53 INFO DAGScheduler: Submitting 4 missing tasks from Stage 6 (MappedRDD[34] at map at joins.scala:214)
14/06/25 13:33:53 INFO TaskSchedulerImpl: Adding task set 6.0 with 4 tasks

枫 2014/6/25 15:09:08
14/06/25 13:33:53 INFO TaskSetManager: Starting task 6.0:0 as TID 12 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:53 INFO TaskSetManager: Serialized task 6.0:0 as 17113 bytes in 2 ms
14/06/25 13:33:53 INFO TaskSetManager: Starting task 6.0:1 as TID 13 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:53 INFO TaskSetManager: Serialized task 6.0:1 as 17113 bytes in 4 ms
14/06/25 13:47:25 INFO TaskSetManager: Starting task 6.0:2 as TID 14 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:47:25 INFO TaskSetManager: Serialized task 6.0:2 as 5974 bytes in 101 ms
14/06/25 13:47:25 INFO TaskSetManager: Finished TID 12 in 812264 ms on 192.168.56.100 (progress: 1/4)
14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 0)
14/06/25 13:47:25 INFO TaskSetManager: Starting task 6.0:3 as TID 15 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:47:25 INFO TaskSetManager: Serialized task 6.0:3 as 5974 bytes in 6 ms
14/06/25 13:47:25 INFO TaskSetManager: Finished TID 14 in 462 ms on 192.168.56.100 (progress: 2/4)
14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 2)
14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 3)
14/06/25 13:47:25 INFO TaskSetManager: Finished TID 15 in 81 ms on 192.168.56.100 (progress: 3/4)
14/06/25 14:11:07 INFO TaskSetManager: Finished TID 13 in 2234716 ms on 192.168.56.100 (progress: 4/4)
14/06/25 14:11:07 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
14/06/25 14:11:07 INFO DAGScheduler: Completed ResultTask(6, 1)
14/06/25 14:11:07 INFO DAGScheduler: Stage 6 (count at joins.scala:216) finished in 2234.742 s
14/06/25 14:11:07 INFO SparkContext: Job finished: count at joins.scala:216, took 2234.877469214 s
14/06/25 14:11:08 INFO SparkContext: Starting job: reduce at joins.scala:219
14/06/25 14:11:08 INFO DAGScheduler: Got job 7 (reduce at joins.scala:219) with 4 output partitions (allowLocal=false)
14/06/25 14:11:08 INFO DAGScheduler: Final stage: Stage 7(reduce at joins.scala:219)
14/06/25 14:11:08 INFO DAGScheduler: Parents of final stage: List()
14/06/25 14:11:08 INFO DAGScheduler: Missing parents: List()
14/06/25 14:11:08 INFO DAGScheduler: Submitting Stage 7 (MappedRDD[35] at map at joins.scala:219), which has no missing parents
14/06/25 14:11:08 INFO DAGScheduler: Submitting 4 missing tasks from Stage 7 (MappedRDD[35] at map at joins.scala:219)
14/06/25 14:11:08 INFO TaskSchedulerImpl: Adding task set 7.0 with 4 tasks
14/06/25 14:11:08 INFO TaskSetManager: Starting task 7.0:0 as TID 16 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 14:11:08 INFO TaskSetManager: Serialized task 7.0:0 as 17130 bytes in 10 ms
14/06/25 14:11:08 INFO TaskSetManager: Starting task 7.0:1 as TID 17 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 14:11:08 INFO TaskSetManager: Serialized task 7.0:1 as 17130 bytes in 4 ms

3) all the query works fine on shark 0.9.1

  was:
when I run spark SQL query, it hang up sometimes:
1) simple SQL query works, such as "select * from a left out join b on a.id=b.id"
2) BUT if it has more joins, such as "select * from a left out join b on a.id=b.id left out join c on a.id=c.id...", spark shell seem to hang up.

spark shell prints:
scala> hc.hql("select A.id,A.tit,A.sub_tit,B.abst,B.cont,A.aut,A.com_name,A.med_name,A.pub_dt,A.upd_time,A.ent_time,A.info_lvl,A.is_pic,A.lnk_addr,A.is_ann,A.info_open_lvl,A.keyw_name,C.typ_code as type,D.evt_dir,D.evt_st,E.secu_id,E.typ_codeii,E.exch_code,E.trd_code,E.secu_sht from txt_nws_bas_update A left outer join txt_nws_bas_txt B on A.id=B.orig_id left outer join txt_nws_typ C on A.id=C.orig_id left outer join txt_nws_secu D on A.id=D.orig_id left outer join bas_secu_info E on D.secu_id=E.secu_id where D.secu_id is not null limit 5").foreach(println)
14/06/25 13:32:25 INFO ParseDriver: Parsing command: select A.id,A.tit,A.sub_tit,B.abst,B.cont,A.aut,A.com_name,A.med_name,A.pub_dt,A.upd_time,A.ent_time,A.info_lvl,A.is_pic,A.lnk_addr,A.is_ann,A.info_open_lvl,A.keyw_name,C.typ_code as type,D.evt_dir,D.evt_st,E.secu_id,E.typ_codeii,E.exch_code,E.trd_code,E.secu_sht from txt_nws_bas_update A left outer join txt_nws_bas_txt B on A.id=B.orig_id left outer join txt_nws_typ C on A.id=C.orig_id left outer join txt_nws_secu D on A.id=D.orig_id left outer join bas_secu_info E on D.secu_id=E.secu_id where D.secu_id is not null limit 5
14/06/25 13:32:25 INFO ParseDriver: Parse Completed
14/06/25 13:32:25 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations
14/06/25 13:32:25 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
14/06/25 13:32:27 INFO MemoryStore: ensureFreeSpace(220923) called with curMem=0, maxMem=311387750
14/06/25 13:32:27 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 215.7 KB, free 296.8 MB)
14/06/25 13:32:27 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=220923, maxMem=311387750
14/06/25 13:32:27 INFO MemoryStore: Block broadcast_1 stored as values to memory (estimated size 215.8 KB, free 296.5 MB)
14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=441894, maxMem=311387750
14/06/25 13:32:28 INFO MemoryStore: Block broadcast_2 stored as values to memory (estimated size 215.8 KB, free 296.3 MB)
14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=662865, maxMem=311387750
14/06/25 13:32:28 INFO MemoryStore: Block broadcast_3 stored as values to memory (estimated size 215.8 KB, free 296.1 MB)
14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=883836, maxMem=311387750
14/06/25 13:32:28 INFO MemoryStore: Block broadcast_4 stored as values to memory (estimated size 215.8 KB, free 295.9 MB)
14/06/25 13:32:29 INFO SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange
14/06/25 13:32:29 INFO SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions
14/06/25 13:32:30 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:32:30 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:32:30 INFO DAGScheduler: Got job 0 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:32:30 INFO DAGScheduler: Final stage: Stage 0(collect at joins.scala:184)
14/06/25 13:32:30 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:32:30 INFO DAGScheduler: Missing parents: List()
14/06/25 13:32:30 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at joins.scala:184), which has no missing parents
14/06/25 13:32:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[7] at map at joins.scala:184)
14/06/25 13:32:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
14/06/25 13:32:30 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:32:30 INFO TaskSetManager: Serialized task 0.0:0 as 4088 bytes in 3 ms
14/06/25 13:32:30 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:32:30 INFO TaskSetManager: Serialized task 0.0:1 as 4088 bytes in 2 ms
14/06/25 13:32:44 INFO BlockManagerInfo: Added taskresult_1 in memory on 192.168.56.100:47102 (size: 15.0 MB, free: 281.9 MB)
14/06/25 13:32:44 INFO SendingConnection: Initiating connection to [/192.168.56.100:47102]
14/06/25 13:32:44 INFO SendingConnection: Connected to [/192.168.56.100:47102], 1 messages pending
14/06/25 13:32:44 INFO BlockManagerInfo: Added taskresult_0 in memory on 192.168.56.100:41322 (size: 15.1 MB, free: 281.9 MB)
14/06/25 13:32:44 INFO SendingConnection: Initiating connection to [/192.168.56.100:41322]
14/06/25 13:32:44 INFO SendingConnection: Connected to [/192.168.56.100:41322], 1 messages pending
14/06/25 13:32:49 INFO ConnectionManager: Accepted connection from [192.168.56.100/192.168.56.100]
14/06/25 13:32:55 INFO BlockManagerInfo: Removed taskresult_1 on 192.168.56.100:47102 in memory (size: 15.0 MB, free: 297.0 MB)
14/06/25 13:32:55 INFO DAGScheduler: Completed ResultTask(0, 1)
14/06/25 13:32:55 INFO TaskSetManager: Finished TID 1 in 24749 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:10 INFO ConnectionManager: Accepted connection from [192.168.56.100/192.168.56.100]
14/06/25 13:33:15 INFO BlockManagerInfo: Removed taskresult_0 on 192.168.56.100:41322 in memory (size: 15.1 MB, free: 297.0 MB)
14/06/25 13:33:15 INFO DAGScheduler: Completed ResultTask(0, 0)
14/06/25 13:33:15 INFO TaskSetManager: Finished TID 0 in 45025 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:15 INFO DAGScheduler: Stage 0 (collect at joins.scala:184) finished in 45.665 s
14/06/25 13:33:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/06/25 13:33:16 INFO SparkContext: Job finished: collect at joins.scala:184, took 45.910877269 s
14/06/25 13:33:20 INFO MemoryStore: ensureFreeSpace(133210496) called with curMem=1104807, maxMem=311387750
14/06/25 13:33:20 INFO MemoryStore: Block broadcast_5 stored as values to memory (estimated size 127.0 MB, free 168.9 MB)
14/06/25 13:33:22 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:22 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:33:22 INFO DAGScheduler: Got job 1 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:33:22 INFO DAGScheduler: Final stage: Stage 1(collect at joins.scala:184)
14/06/25 13:33:22 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:22 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:22 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[12] at map at joins.scala:184), which has no missing parents
14/06/25 13:33:22 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[12] at map at joins.scala:184)
14/06/25 13:33:22 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
14/06/25 13:33:22 INFO TaskSetManager: Starting task 1.0:0 as TID 2 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:22 INFO TaskSetManager: Serialized task 1.0:0 as 3705 bytes in 0 ms
14/06/25 13:33:22 INFO TaskSetManager: Starting task 1.0:1 as TID 3 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:22 INFO TaskSetManager: Serialized task 1.0:1 as 3705 bytes in 0 ms
14/06/25 13:33:26 INFO DAGScheduler: Completed ResultTask(1, 1)
14/06/25 13:33:26 INFO TaskSetManager: Finished TID 3 in 4178 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:26 INFO DAGScheduler: Completed ResultTask(1, 0)
14/06/25 13:33:26 INFO DAGScheduler: Stage 1 (collect at joins.scala:184) finished in 4.365 s
14/06/25 13:33:26 INFO SparkContext: Job finished: collect at joins.scala:184, took 4.437216848 s
14/06/25 13:33:26 INFO TaskSetManager: Finished TID 2 in 4278 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:26 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
14/06/25 13:33:27 INFO MemoryStore: ensureFreeSpace(31747896) called with curMem=134315303, maxMem=311387750
14/06/25 13:33:27 INFO MemoryStore: Block broadcast_6 stored as values to memory (estimated size 30.3 MB, free 138.6 MB)
14/06/25 13:33:28 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:28 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:33:28 INFO DAGScheduler: Got job 2 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:33:28 INFO DAGScheduler: Final stage: Stage 2(collect at joins.scala:184)
14/06/25 13:33:28 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:28 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:28 INFO DAGScheduler: Submitting Stage 2 (MappedRDD[17] at map at joins.scala:184), which has no missing parents
14/06/25 13:33:28 INFO DAGScheduler: Submitting 2 missing tasks from Stage 2 (MappedRDD[17] at map at joins.scala:184)
14/06/25 13:33:28 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
14/06/25 13:33:28 INFO TaskSetManager: Starting task 2.0:0 as TID 4 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:28 INFO TaskSetManager: Serialized task 2.0:0 as 3627 bytes in 0 ms
14/06/25 13:33:28 INFO TaskSetManager: Starting task 2.0:1 as TID 5 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:28 INFO TaskSetManager: Serialized task 2.0:1 as 3627 bytes in 0 ms
14/06/25 13:33:30 INFO DAGScheduler: Completed ResultTask(2, 0)
14/06/25 13:33:30 INFO TaskSetManager: Finished TID 4 in 2285 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:30 INFO DAGScheduler: Completed ResultTask(2, 1)
14/06/25 13:33:30 INFO DAGScheduler: Stage 2 (collect at joins.scala:184) finished in 2.681 s
14/06/25 13:33:30 INFO SparkContext: Job finished: collect at joins.scala:184, took 2.743363946 s
14/06/25 13:33:30 INFO TaskSetManager: Finished TID 5 in 2610 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:30 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
14/06/25 13:33:33 INFO MemoryStore: ensureFreeSpace(49064048) called with curMem=166063199, maxMem=311387750
14/06/25 13:33:33 INFO MemoryStore: Block broadcast_7 stored as values to memory (estimated size 46.8 MB, free 91.8 MB)
14/06/25 13:33:34 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:34 INFO SparkContext: Starting job: collect at joins.scala:184
14/06/25 13:33:34 INFO DAGScheduler: Got job 3 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
14/06/25 13:33:34 INFO DAGScheduler: Final stage: Stage 3(collect at joins.scala:184)
14/06/25 13:33:34 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:34 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:34 INFO DAGScheduler: Submitting Stage 3 (MappedRDD[22] at map at joins.scala:184), which has no missing parents
14/06/25 13:33:34 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3 (MappedRDD[22] at map at joins.scala:184)
14/06/25 13:33:34 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
14/06/25 13:33:34 INFO TaskSetManager: Starting task 3.0:0 as TID 6 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:34 INFO TaskSetManager: Serialized task 3.0:0 as 3708 bytes in 1 ms
14/06/25 13:33:34 INFO TaskSetManager: Starting task 3.0:1 as TID 7 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:34 INFO TaskSetManager: Serialized task 3.0:1 as 3708 bytes in 1 ms
14/06/25 13:33:35 INFO DAGScheduler: Completed ResultTask(3, 1)
14/06/25 13:33:35 INFO TaskSetManager: Finished TID 7 in 608 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:35 INFO DAGScheduler: Completed ResultTask(3, 0)
14/06/25 13:33:35 INFO DAGScheduler: Stage 3 (collect at joins.scala:184) finished in 0.636 s
14/06/25 13:33:35 INFO SparkContext: Job finished: collect at joins.scala:184, took 0.647360428 s
14/06/25 13:33:35 INFO TaskSetManager: Finished TID 6 in 627 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:35 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
14/06/25 13:33:35 INFO MemoryStore: ensureFreeSpace(4471280) called with curMem=215127247, maxMem=311387750
14/06/25 13:33:35 INFO MemoryStore: Block broadcast_8 stored as values to memory (estimated size 4.3 MB, free 87.5 MB)
14/06/25 13:33:35 INFO FileInputFormat: Total input paths to process : 1
14/06/25 13:33:35 INFO SparkContext: Starting job: count at joins.scala:216
14/06/25 13:33:35 INFO DAGScheduler: Got job 4 (count at joins.scala:216) with 2 output partitions (allowLocal=false)
14/06/25 13:33:35 INFO DAGScheduler: Final stage: Stage 4(count at joins.scala:216)
14/06/25 13:33:35 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:35 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:35 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[28] at map at joins.scala:214), which has no missing parents
14/06/25 13:33:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[28] at map at joins.scala:214)
14/06/25 13:33:36 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
14/06/25 13:33:36 INFO TaskSetManager: Starting task 4.0:0 as TID 8 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:36 INFO TaskSetManager: Serialized task 4.0:0 as 4570 bytes in 0 ms
14/06/25 13:33:36 INFO TaskSetManager: Starting task 4.0:1 as TID 9 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:36 INFO TaskSetManager: Serialized task 4.0:1 as 4570 bytes in 0 ms
14/06/25 13:33:47 INFO TaskSetManager: Finished TID 9 in 10869 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:47 INFO DAGScheduler: Completed ResultTask(4, 1)
14/06/25 13:33:47 INFO TaskSetManager: Finished TID 8 in 11240 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:47 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
14/06/25 13:33:47 INFO DAGScheduler: Completed ResultTask(4, 0)
14/06/25 13:33:47 INFO DAGScheduler: Stage 4 (count at joins.scala:216) finished in 11.191 s
14/06/25 13:33:47 INFO SparkContext: Job finished: count at joins.scala:216, took 12.351801444 s
14/06/25 13:33:47 INFO SparkContext: Starting job: reduce at joins.scala:219
14/06/25 13:33:47 INFO DAGScheduler: Got job 5 (reduce at joins.scala:219) with 2 output partitions (allowLocal=false)
14/06/25 13:33:47 INFO DAGScheduler: Final stage: Stage 5(reduce at joins.scala:219)
14/06/25 13:33:47 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:47 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:47 INFO DAGScheduler: Submitting Stage 5 (MappedRDD[29] at map at joins.scala:219), which has no missing parents
14/06/25 13:33:47 INFO DAGScheduler: Submitting 2 missing tasks from Stage 5 (MappedRDD[29] at map at joins.scala:219)
14/06/25 13:33:47 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
14/06/25 13:33:47 INFO TaskSetManager: Starting task 5.0:0 as TID 10 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:47 INFO TaskSetManager: Serialized task 5.0:0 as 4586 bytes in 0 ms
14/06/25 13:33:47 INFO TaskSetManager: Starting task 5.0:1 as TID 11 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:47 INFO TaskSetManager: Serialized task 5.0:1 as 4586 bytes in 0 ms
14/06/25 13:33:52 INFO TaskSetManager: Finished TID 10 in 4929 ms on 192.168.56.100 (progress: 1/2)
14/06/25 13:33:52 INFO DAGScheduler: Completed ResultTask(5, 0)
14/06/25 13:33:52 INFO TaskSetManager: Finished TID 11 in 5285 ms on 192.168.56.100 (progress: 2/2)
14/06/25 13:33:52 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
14/06/25 13:33:52 INFO DAGScheduler: Completed ResultTask(5, 1)
14/06/25 13:33:52 INFO DAGScheduler: Stage 5 (reduce at joins.scala:219) finished in 5.236 s
14/06/25 13:33:52 INFO SparkContext: Job finished: reduce at joins.scala:219, took 5.306035846 s
14/06/25 13:33:53 INFO SparkContext: Starting job: count at joins.scala:216
14/06/25 13:33:53 INFO DAGScheduler: Got job 6 (count at joins.scala:216) with 4 output partitions (allowLocal=false)
14/06/25 13:33:53 INFO DAGScheduler: Final stage: Stage 6(count at joins.scala:216)
14/06/25 13:33:53 INFO DAGScheduler: Parents of final stage: List()
14/06/25 13:33:53 INFO DAGScheduler: Missing parents: List()
14/06/25 13:33:53 INFO DAGScheduler: Submitting Stage 6 (MappedRDD[34] at map at joins.scala:214), which has no missing parents
14/06/25 13:33:53 INFO DAGScheduler: Submitting 4 missing tasks from Stage 6 (MappedRDD[34] at map at joins.scala:214)
14/06/25 13:33:53 INFO TaskSchedulerImpl: Adding task set 6.0 with 4 tasks

枫 2014/6/25 15:09:08
14/06/25 13:33:53 INFO TaskSetManager: Starting task 6.0:0 as TID 12 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:53 INFO TaskSetManager: Serialized task 6.0:0 as 17113 bytes in 2 ms
14/06/25 13:33:53 INFO TaskSetManager: Starting task 6.0:1 as TID 13 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:33:53 INFO TaskSetManager: Serialized task 6.0:1 as 17113 bytes in 4 ms
14/06/25 13:47:25 INFO TaskSetManager: Starting task 6.0:2 as TID 14 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:47:25 INFO TaskSetManager: Serialized task 6.0:2 as 5974 bytes in 101 ms
14/06/25 13:47:25 INFO TaskSetManager: Finished TID 12 in 812264 ms on 192.168.56.100 (progress: 1/4)
14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 0)
14/06/25 13:47:25 INFO TaskSetManager: Starting task 6.0:3 as TID 15 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 13:47:25 INFO TaskSetManager: Serialized task 6.0:3 as 5974 bytes in 6 ms
14/06/25 13:47:25 INFO TaskSetManager: Finished TID 14 in 462 ms on 192.168.56.100 (progress: 2/4)
14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 2)
14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 3)
14/06/25 13:47:25 INFO TaskSetManager: Finished TID 15 in 81 ms on 192.168.56.100 (progress: 3/4)
14/06/25 14:11:07 INFO TaskSetManager: Finished TID 13 in 2234716 ms on 192.168.56.100 (progress: 4/4)
14/06/25 14:11:07 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
14/06/25 14:11:07 INFO DAGScheduler: Completed ResultTask(6, 1)
14/06/25 14:11:07 INFO DAGScheduler: Stage 6 (count at joins.scala:216) finished in 2234.742 s
14/06/25 14:11:07 INFO SparkContext: Job finished: count at joins.scala:216, took 2234.877469214 s
14/06/25 14:11:08 INFO SparkContext: Starting job: reduce at joins.scala:219
14/06/25 14:11:08 INFO DAGScheduler: Got job 7 (reduce at joins.scala:219) with 4 output partitions (allowLocal=false)
14/06/25 14:11:08 INFO DAGScheduler: Final stage: Stage 7(reduce at joins.scala:219)
14/06/25 14:11:08 INFO DAGScheduler: Parents of final stage: List()
14/06/25 14:11:08 INFO DAGScheduler: Missing parents: List()
14/06/25 14:11:08 INFO DAGScheduler: Submitting Stage 7 (MappedRDD[35] at map at joins.scala:219), which has no missing parents
14/06/25 14:11:08 INFO DAGScheduler: Submitting 4 missing tasks from Stage 7 (MappedRDD[35] at map at joins.scala:219)
14/06/25 14:11:08 INFO TaskSchedulerImpl: Adding task set 7.0 with 4 tasks
14/06/25 14:11:08 INFO TaskSetManager: Starting task 7.0:0 as TID 16 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 14:11:08 INFO TaskSetManager: Serialized task 7.0:0 as 17130 bytes in 10 ms
14/06/25 14:11:08 INFO TaskSetManager: Starting task 7.0:1 as TID 17 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
14/06/25 14:11:08 INFO TaskSetManager: Serialized task 7.0:1 as 17130 bytes in 4 ms



> spark SQL query hang up sometime
> --------------------------------
>
>                 Key: SPARK-2274
>                 URL: https://issues.apache.org/jira/browse/SPARK-2274
>             Project: Spark
>          Issue Type: Question
>         Environment: spark 1.0.0
>            Reporter: jackielihf
>
> when I run spark SQL query, it hang up sometimes:
> 1) simple SQL query works, such as "select * from a left out join b on a.id=b.id"
> 2) BUT if it has more joins, such as "select * from a left out join b on a.id=b.id left out join c on a.id=c.id...", spark shell seem to hang up.
> spark shell prints:
> scala> hc.hql("select A.id,A.tit,A.sub_tit,B.abst,B.cont,A.aut,A.com_name,A.med_name,A.pub_dt,A.upd_time,A.ent_time,A.info_lvl,A.is_pic,A.lnk_addr,A.is_ann,A.info_open_lvl,A.keyw_name,C.typ_code as type,D.evt_dir,D.evt_st,E.secu_id,E.typ_codeii,E.exch_code,E.trd_code,E.secu_sht from txt_nws_bas_update A left outer join txt_nws_bas_txt B on A.id=B.orig_id left outer join txt_nws_typ C on A.id=C.orig_id left outer join txt_nws_secu D on A.id=D.orig_id left outer join bas_secu_info E on D.secu_id=E.secu_id where D.secu_id is not null limit 5").foreach(println)
> 14/06/25 13:32:25 INFO ParseDriver: Parsing command: select A.id,A.tit,A.sub_tit,B.abst,B.cont,A.aut,A.com_name,A.med_name,A.pub_dt,A.upd_time,A.ent_time,A.info_lvl,A.is_pic,A.lnk_addr,A.is_ann,A.info_open_lvl,A.keyw_name,C.typ_code as type,D.evt_dir,D.evt_st,E.secu_id,E.typ_codeii,E.exch_code,E.trd_code,E.secu_sht from txt_nws_bas_update A left outer join txt_nws_bas_txt B on A.id=B.orig_id left outer join txt_nws_typ C on A.id=C.orig_id left outer join txt_nws_secu D on A.id=D.orig_id left outer join bas_secu_info E on D.secu_id=E.secu_id where D.secu_id is not null limit 5
> 14/06/25 13:32:25 INFO ParseDriver: Parse Completed
> 14/06/25 13:32:25 INFO Analyzer: Max iterations (2) reached for batch MultiInstanceRelations
> 14/06/25 13:32:25 INFO Analyzer: Max iterations (2) reached for batch CaseInsensitiveAttributeReferences
> 14/06/25 13:32:27 INFO MemoryStore: ensureFreeSpace(220923) called with curMem=0, maxMem=311387750
> 14/06/25 13:32:27 INFO MemoryStore: Block broadcast_0 stored as values to memory (estimated size 215.7 KB, free 296.8 MB)
> 14/06/25 13:32:27 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=220923, maxMem=311387750
> 14/06/25 13:32:27 INFO MemoryStore: Block broadcast_1 stored as values to memory (estimated size 215.8 KB, free 296.5 MB)
> 14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=441894, maxMem=311387750
> 14/06/25 13:32:28 INFO MemoryStore: Block broadcast_2 stored as values to memory (estimated size 215.8 KB, free 296.3 MB)
> 14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=662865, maxMem=311387750
> 14/06/25 13:32:28 INFO MemoryStore: Block broadcast_3 stored as values to memory (estimated size 215.8 KB, free 296.1 MB)
> 14/06/25 13:32:28 INFO MemoryStore: ensureFreeSpace(220971) called with curMem=883836, maxMem=311387750
> 14/06/25 13:32:28 INFO MemoryStore: Block broadcast_4 stored as values to memory (estimated size 215.8 KB, free 295.9 MB)
> 14/06/25 13:32:29 INFO SQLContext$$anon$1: Max iterations (2) reached for batch Add exchange
> 14/06/25 13:32:29 INFO SQLContext$$anon$1: Max iterations (2) reached for batch Prepare Expressions
> 14/06/25 13:32:30 INFO FileInputFormat: Total input paths to process : 1
> 14/06/25 13:32:30 INFO SparkContext: Starting job: collect at joins.scala:184
> 14/06/25 13:32:30 INFO DAGScheduler: Got job 0 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
> 14/06/25 13:32:30 INFO DAGScheduler: Final stage: Stage 0(collect at joins.scala:184)
> 14/06/25 13:32:30 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 13:32:30 INFO DAGScheduler: Missing parents: List()
> 14/06/25 13:32:30 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[7] at map at joins.scala:184), which has no missing parents
> 14/06/25 13:32:30 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[7] at map at joins.scala:184)
> 14/06/25 13:32:30 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
> 14/06/25 13:32:30 INFO TaskSetManager: Starting task 0.0:0 as TID 0 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:32:30 INFO TaskSetManager: Serialized task 0.0:0 as 4088 bytes in 3 ms
> 14/06/25 13:32:30 INFO TaskSetManager: Starting task 0.0:1 as TID 1 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:32:30 INFO TaskSetManager: Serialized task 0.0:1 as 4088 bytes in 2 ms
> 14/06/25 13:32:44 INFO BlockManagerInfo: Added taskresult_1 in memory on 192.168.56.100:47102 (size: 15.0 MB, free: 281.9 MB)
> 14/06/25 13:32:44 INFO SendingConnection: Initiating connection to [/192.168.56.100:47102]
> 14/06/25 13:32:44 INFO SendingConnection: Connected to [/192.168.56.100:47102], 1 messages pending
> 14/06/25 13:32:44 INFO BlockManagerInfo: Added taskresult_0 in memory on 192.168.56.100:41322 (size: 15.1 MB, free: 281.9 MB)
> 14/06/25 13:32:44 INFO SendingConnection: Initiating connection to [/192.168.56.100:41322]
> 14/06/25 13:32:44 INFO SendingConnection: Connected to [/192.168.56.100:41322], 1 messages pending
> 14/06/25 13:32:49 INFO ConnectionManager: Accepted connection from [192.168.56.100/192.168.56.100]
> 14/06/25 13:32:55 INFO BlockManagerInfo: Removed taskresult_1 on 192.168.56.100:47102 in memory (size: 15.0 MB, free: 297.0 MB)
> 14/06/25 13:32:55 INFO DAGScheduler: Completed ResultTask(0, 1)
> 14/06/25 13:32:55 INFO TaskSetManager: Finished TID 1 in 24749 ms on 192.168.56.100 (progress: 1/2)
> 14/06/25 13:33:10 INFO ConnectionManager: Accepted connection from [192.168.56.100/192.168.56.100]
> 14/06/25 13:33:15 INFO BlockManagerInfo: Removed taskresult_0 on 192.168.56.100:41322 in memory (size: 15.1 MB, free: 297.0 MB)
> 14/06/25 13:33:15 INFO DAGScheduler: Completed ResultTask(0, 0)
> 14/06/25 13:33:15 INFO TaskSetManager: Finished TID 0 in 45025 ms on 192.168.56.100 (progress: 2/2)
> 14/06/25 13:33:15 INFO DAGScheduler: Stage 0 (collect at joins.scala:184) finished in 45.665 s
> 14/06/25 13:33:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
> 14/06/25 13:33:16 INFO SparkContext: Job finished: collect at joins.scala:184, took 45.910877269 s
> 14/06/25 13:33:20 INFO MemoryStore: ensureFreeSpace(133210496) called with curMem=1104807, maxMem=311387750
> 14/06/25 13:33:20 INFO MemoryStore: Block broadcast_5 stored as values to memory (estimated size 127.0 MB, free 168.9 MB)
> 14/06/25 13:33:22 INFO FileInputFormat: Total input paths to process : 1
> 14/06/25 13:33:22 INFO SparkContext: Starting job: collect at joins.scala:184
> 14/06/25 13:33:22 INFO DAGScheduler: Got job 1 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
> 14/06/25 13:33:22 INFO DAGScheduler: Final stage: Stage 1(collect at joins.scala:184)
> 14/06/25 13:33:22 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 13:33:22 INFO DAGScheduler: Missing parents: List()
> 14/06/25 13:33:22 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[12] at map at joins.scala:184), which has no missing parents
> 14/06/25 13:33:22 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[12] at map at joins.scala:184)
> 14/06/25 13:33:22 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
> 14/06/25 13:33:22 INFO TaskSetManager: Starting task 1.0:0 as TID 2 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:22 INFO TaskSetManager: Serialized task 1.0:0 as 3705 bytes in 0 ms
> 14/06/25 13:33:22 INFO TaskSetManager: Starting task 1.0:1 as TID 3 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:22 INFO TaskSetManager: Serialized task 1.0:1 as 3705 bytes in 0 ms
> 14/06/25 13:33:26 INFO DAGScheduler: Completed ResultTask(1, 1)
> 14/06/25 13:33:26 INFO TaskSetManager: Finished TID 3 in 4178 ms on 192.168.56.100 (progress: 1/2)
> 14/06/25 13:33:26 INFO DAGScheduler: Completed ResultTask(1, 0)
> 14/06/25 13:33:26 INFO DAGScheduler: Stage 1 (collect at joins.scala:184) finished in 4.365 s
> 14/06/25 13:33:26 INFO SparkContext: Job finished: collect at joins.scala:184, took 4.437216848 s
> 14/06/25 13:33:26 INFO TaskSetManager: Finished TID 2 in 4278 ms on 192.168.56.100 (progress: 2/2)
> 14/06/25 13:33:26 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
> 14/06/25 13:33:27 INFO MemoryStore: ensureFreeSpace(31747896) called with curMem=134315303, maxMem=311387750
> 14/06/25 13:33:27 INFO MemoryStore: Block broadcast_6 stored as values to memory (estimated size 30.3 MB, free 138.6 MB)
> 14/06/25 13:33:28 INFO FileInputFormat: Total input paths to process : 1
> 14/06/25 13:33:28 INFO SparkContext: Starting job: collect at joins.scala:184
> 14/06/25 13:33:28 INFO DAGScheduler: Got job 2 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
> 14/06/25 13:33:28 INFO DAGScheduler: Final stage: Stage 2(collect at joins.scala:184)
> 14/06/25 13:33:28 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 13:33:28 INFO DAGScheduler: Missing parents: List()
> 14/06/25 13:33:28 INFO DAGScheduler: Submitting Stage 2 (MappedRDD[17] at map at joins.scala:184), which has no missing parents
> 14/06/25 13:33:28 INFO DAGScheduler: Submitting 2 missing tasks from Stage 2 (MappedRDD[17] at map at joins.scala:184)
> 14/06/25 13:33:28 INFO TaskSchedulerImpl: Adding task set 2.0 with 2 tasks
> 14/06/25 13:33:28 INFO TaskSetManager: Starting task 2.0:0 as TID 4 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:28 INFO TaskSetManager: Serialized task 2.0:0 as 3627 bytes in 0 ms
> 14/06/25 13:33:28 INFO TaskSetManager: Starting task 2.0:1 as TID 5 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:28 INFO TaskSetManager: Serialized task 2.0:1 as 3627 bytes in 0 ms
> 14/06/25 13:33:30 INFO DAGScheduler: Completed ResultTask(2, 0)
> 14/06/25 13:33:30 INFO TaskSetManager: Finished TID 4 in 2285 ms on 192.168.56.100 (progress: 1/2)
> 14/06/25 13:33:30 INFO DAGScheduler: Completed ResultTask(2, 1)
> 14/06/25 13:33:30 INFO DAGScheduler: Stage 2 (collect at joins.scala:184) finished in 2.681 s
> 14/06/25 13:33:30 INFO SparkContext: Job finished: collect at joins.scala:184, took 2.743363946 s
> 14/06/25 13:33:30 INFO TaskSetManager: Finished TID 5 in 2610 ms on 192.168.56.100 (progress: 2/2)
> 14/06/25 13:33:30 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
> 14/06/25 13:33:33 INFO MemoryStore: ensureFreeSpace(49064048) called with curMem=166063199, maxMem=311387750
> 14/06/25 13:33:33 INFO MemoryStore: Block broadcast_7 stored as values to memory (estimated size 46.8 MB, free 91.8 MB)
> 14/06/25 13:33:34 INFO FileInputFormat: Total input paths to process : 1
> 14/06/25 13:33:34 INFO SparkContext: Starting job: collect at joins.scala:184
> 14/06/25 13:33:34 INFO DAGScheduler: Got job 3 (collect at joins.scala:184) with 2 output partitions (allowLocal=false)
> 14/06/25 13:33:34 INFO DAGScheduler: Final stage: Stage 3(collect at joins.scala:184)
> 14/06/25 13:33:34 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 13:33:34 INFO DAGScheduler: Missing parents: List()
> 14/06/25 13:33:34 INFO DAGScheduler: Submitting Stage 3 (MappedRDD[22] at map at joins.scala:184), which has no missing parents
> 14/06/25 13:33:34 INFO DAGScheduler: Submitting 2 missing tasks from Stage 3 (MappedRDD[22] at map at joins.scala:184)
> 14/06/25 13:33:34 INFO TaskSchedulerImpl: Adding task set 3.0 with 2 tasks
> 14/06/25 13:33:34 INFO TaskSetManager: Starting task 3.0:0 as TID 6 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:34 INFO TaskSetManager: Serialized task 3.0:0 as 3708 bytes in 1 ms
> 14/06/25 13:33:34 INFO TaskSetManager: Starting task 3.0:1 as TID 7 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:34 INFO TaskSetManager: Serialized task 3.0:1 as 3708 bytes in 1 ms
> 14/06/25 13:33:35 INFO DAGScheduler: Completed ResultTask(3, 1)
> 14/06/25 13:33:35 INFO TaskSetManager: Finished TID 7 in 608 ms on 192.168.56.100 (progress: 1/2)
> 14/06/25 13:33:35 INFO DAGScheduler: Completed ResultTask(3, 0)
> 14/06/25 13:33:35 INFO DAGScheduler: Stage 3 (collect at joins.scala:184) finished in 0.636 s
> 14/06/25 13:33:35 INFO SparkContext: Job finished: collect at joins.scala:184, took 0.647360428 s
> 14/06/25 13:33:35 INFO TaskSetManager: Finished TID 6 in 627 ms on 192.168.56.100 (progress: 2/2)
> 14/06/25 13:33:35 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
> 14/06/25 13:33:35 INFO MemoryStore: ensureFreeSpace(4471280) called with curMem=215127247, maxMem=311387750
> 14/06/25 13:33:35 INFO MemoryStore: Block broadcast_8 stored as values to memory (estimated size 4.3 MB, free 87.5 MB)
> 14/06/25 13:33:35 INFO FileInputFormat: Total input paths to process : 1
> 14/06/25 13:33:35 INFO SparkContext: Starting job: count at joins.scala:216
> 14/06/25 13:33:35 INFO DAGScheduler: Got job 4 (count at joins.scala:216) with 2 output partitions (allowLocal=false)
> 14/06/25 13:33:35 INFO DAGScheduler: Final stage: Stage 4(count at joins.scala:216)
> 14/06/25 13:33:35 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 13:33:35 INFO DAGScheduler: Missing parents: List()
> 14/06/25 13:33:35 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[28] at map at joins.scala:214), which has no missing parents
> 14/06/25 13:33:36 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[28] at map at joins.scala:214)
> 14/06/25 13:33:36 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
> 14/06/25 13:33:36 INFO TaskSetManager: Starting task 4.0:0 as TID 8 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:36 INFO TaskSetManager: Serialized task 4.0:0 as 4570 bytes in 0 ms
> 14/06/25 13:33:36 INFO TaskSetManager: Starting task 4.0:1 as TID 9 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:36 INFO TaskSetManager: Serialized task 4.0:1 as 4570 bytes in 0 ms
> 14/06/25 13:33:47 INFO TaskSetManager: Finished TID 9 in 10869 ms on 192.168.56.100 (progress: 1/2)
> 14/06/25 13:33:47 INFO DAGScheduler: Completed ResultTask(4, 1)
> 14/06/25 13:33:47 INFO TaskSetManager: Finished TID 8 in 11240 ms on 192.168.56.100 (progress: 2/2)
> 14/06/25 13:33:47 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
> 14/06/25 13:33:47 INFO DAGScheduler: Completed ResultTask(4, 0)
> 14/06/25 13:33:47 INFO DAGScheduler: Stage 4 (count at joins.scala:216) finished in 11.191 s
> 14/06/25 13:33:47 INFO SparkContext: Job finished: count at joins.scala:216, took 12.351801444 s
> 14/06/25 13:33:47 INFO SparkContext: Starting job: reduce at joins.scala:219
> 14/06/25 13:33:47 INFO DAGScheduler: Got job 5 (reduce at joins.scala:219) with 2 output partitions (allowLocal=false)
> 14/06/25 13:33:47 INFO DAGScheduler: Final stage: Stage 5(reduce at joins.scala:219)
> 14/06/25 13:33:47 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 13:33:47 INFO DAGScheduler: Missing parents: List()
> 14/06/25 13:33:47 INFO DAGScheduler: Submitting Stage 5 (MappedRDD[29] at map at joins.scala:219), which has no missing parents
> 14/06/25 13:33:47 INFO DAGScheduler: Submitting 2 missing tasks from Stage 5 (MappedRDD[29] at map at joins.scala:219)
> 14/06/25 13:33:47 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
> 14/06/25 13:33:47 INFO TaskSetManager: Starting task 5.0:0 as TID 10 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:47 INFO TaskSetManager: Serialized task 5.0:0 as 4586 bytes in 0 ms
> 14/06/25 13:33:47 INFO TaskSetManager: Starting task 5.0:1 as TID 11 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:47 INFO TaskSetManager: Serialized task 5.0:1 as 4586 bytes in 0 ms
> 14/06/25 13:33:52 INFO TaskSetManager: Finished TID 10 in 4929 ms on 192.168.56.100 (progress: 1/2)
> 14/06/25 13:33:52 INFO DAGScheduler: Completed ResultTask(5, 0)
> 14/06/25 13:33:52 INFO TaskSetManager: Finished TID 11 in 5285 ms on 192.168.56.100 (progress: 2/2)
> 14/06/25 13:33:52 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
> 14/06/25 13:33:52 INFO DAGScheduler: Completed ResultTask(5, 1)
> 14/06/25 13:33:52 INFO DAGScheduler: Stage 5 (reduce at joins.scala:219) finished in 5.236 s
> 14/06/25 13:33:52 INFO SparkContext: Job finished: reduce at joins.scala:219, took 5.306035846 s
> 14/06/25 13:33:53 INFO SparkContext: Starting job: count at joins.scala:216
> 14/06/25 13:33:53 INFO DAGScheduler: Got job 6 (count at joins.scala:216) with 4 output partitions (allowLocal=false)
> 14/06/25 13:33:53 INFO DAGScheduler: Final stage: Stage 6(count at joins.scala:216)
> 14/06/25 13:33:53 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 13:33:53 INFO DAGScheduler: Missing parents: List()
> 14/06/25 13:33:53 INFO DAGScheduler: Submitting Stage 6 (MappedRDD[34] at map at joins.scala:214), which has no missing parents
> 14/06/25 13:33:53 INFO DAGScheduler: Submitting 4 missing tasks from Stage 6 (MappedRDD[34] at map at joins.scala:214)
> 14/06/25 13:33:53 INFO TaskSchedulerImpl: Adding task set 6.0 with 4 tasks
> 枫 2014/6/25 15:09:08
> 14/06/25 13:33:53 INFO TaskSetManager: Starting task 6.0:0 as TID 12 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:53 INFO TaskSetManager: Serialized task 6.0:0 as 17113 bytes in 2 ms
> 14/06/25 13:33:53 INFO TaskSetManager: Starting task 6.0:1 as TID 13 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:33:53 INFO TaskSetManager: Serialized task 6.0:1 as 17113 bytes in 4 ms
> 14/06/25 13:47:25 INFO TaskSetManager: Starting task 6.0:2 as TID 14 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:47:25 INFO TaskSetManager: Serialized task 6.0:2 as 5974 bytes in 101 ms
> 14/06/25 13:47:25 INFO TaskSetManager: Finished TID 12 in 812264 ms on 192.168.56.100 (progress: 1/4)
> 14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 0)
> 14/06/25 13:47:25 INFO TaskSetManager: Starting task 6.0:3 as TID 15 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 13:47:25 INFO TaskSetManager: Serialized task 6.0:3 as 5974 bytes in 6 ms
> 14/06/25 13:47:25 INFO TaskSetManager: Finished TID 14 in 462 ms on 192.168.56.100 (progress: 2/4)
> 14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 2)
> 14/06/25 13:47:25 INFO DAGScheduler: Completed ResultTask(6, 3)
> 14/06/25 13:47:25 INFO TaskSetManager: Finished TID 15 in 81 ms on 192.168.56.100 (progress: 3/4)
> 14/06/25 14:11:07 INFO TaskSetManager: Finished TID 13 in 2234716 ms on 192.168.56.100 (progress: 4/4)
> 14/06/25 14:11:07 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool
> 14/06/25 14:11:07 INFO DAGScheduler: Completed ResultTask(6, 1)
> 14/06/25 14:11:07 INFO DAGScheduler: Stage 6 (count at joins.scala:216) finished in 2234.742 s
> 14/06/25 14:11:07 INFO SparkContext: Job finished: count at joins.scala:216, took 2234.877469214 s
> 14/06/25 14:11:08 INFO SparkContext: Starting job: reduce at joins.scala:219
> 14/06/25 14:11:08 INFO DAGScheduler: Got job 7 (reduce at joins.scala:219) with 4 output partitions (allowLocal=false)
> 14/06/25 14:11:08 INFO DAGScheduler: Final stage: Stage 7(reduce at joins.scala:219)
> 14/06/25 14:11:08 INFO DAGScheduler: Parents of final stage: List()
> 14/06/25 14:11:08 INFO DAGScheduler: Missing parents: List()
> 14/06/25 14:11:08 INFO DAGScheduler: Submitting Stage 7 (MappedRDD[35] at map at joins.scala:219), which has no missing parents
> 14/06/25 14:11:08 INFO DAGScheduler: Submitting 4 missing tasks from Stage 7 (MappedRDD[35] at map at joins.scala:219)
> 14/06/25 14:11:08 INFO TaskSchedulerImpl: Adding task set 7.0 with 4 tasks
> 14/06/25 14:11:08 INFO TaskSetManager: Starting task 7.0:0 as TID 16 on executor 0: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 14:11:08 INFO TaskSetManager: Serialized task 7.0:0 as 17130 bytes in 10 ms
> 14/06/25 14:11:08 INFO TaskSetManager: Starting task 7.0:1 as TID 17 on executor 1: 192.168.56.100 (PROCESS_LOCAL)
> 14/06/25 14:11:08 INFO TaskSetManager: Serialized task 7.0:1 as 17130 bytes in 4 ms
> 3) all the query works fine on shark 0.9.1



--
This message was sent by Atlassian JIRA
(v6.2#6252)