You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Simeon Simeonov (JIRA)" <ji...@apache.org> on 2015/08/05 05:30:04 UTC

[jira] [Created] (SPARK-9625) SparkILoop creates sql context continuously, thousands of times

Simeon Simeonov created SPARK-9625:
--------------------------------------

             Summary: SparkILoop creates sql context continuously, thousands of times
                 Key: SPARK-9625
                 URL: https://issues.apache.org/jira/browse/SPARK-9625
             Project: Spark
          Issue Type: Bug
          Components: Spark Shell
    Affects Versions: 1.4.1
         Environment: Ubuntu on AWS
            Reporter: Simeon Simeonov


Occasionally but repeatably, based on the Spark SQL operations being run, {{spark-shell}} gets into a funk where it attempts to create a sql context over and over again as it is doing its work. Example output below:

{code}
15/08/05 03:04:12 INFO DAGScheduler: looking for newly runnable stages
15/08/05 03:04:12 INFO DAGScheduler: running: Set()
15/08/05 03:04:12 INFO DAGScheduler: waiting: Set(ShuffleMapStage 7, ResultStage 8)
15/08/05 03:04:12 INFO DAGScheduler: failed: Set()
15/08/05 03:04:12 INFO DAGScheduler: Missing parents for ShuffleMapStage 7: List()
15/08/05 03:04:12 INFO DAGScheduler: Missing parents for ResultStage 8: List(ShuffleMapStage 7)
15/08/05 03:04:12 INFO DAGScheduler: Submitting ShuffleMapStage 7 (MapPartitionsRDD[49] at map at <console>:474), which is now runnable
15/08/05 03:04:12 INFO MemoryStore: ensureFreeSpace(47840) called with curMem=685306, maxMem=26671746908
15/08/05 03:04:12 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 46.7 KB, free 24.8 GB)
15/08/05 03:04:12 INFO MemoryStore: ensureFreeSpace(15053) called with curMem=733146, maxMem=26671746908
15/08/05 03:04:12 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 14.7 KB, free 24.8 GB)
15/08/05 03:04:12 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on localhost:39451 (size: 14.7 KB, free: 24.8 GB)
15/08/05 03:04:12 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:874
15/08/05 03:04:12 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 7 (MapPartitionsRDD[49] at map at <console>:474)
15/08/05 03:04:12 INFO TaskSchedulerImpl: Adding task set 7.0 with 1 tasks
15/08/05 03:04:12 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 684, localhost, PROCESS_LOCAL, 1461 bytes)
15/08/05 03:04:12 INFO Executor: Running task 0.0 in stage 7.0 (TID 684)
15/08/05 03:04:12 INFO ShuffleBlockFetcherIterator: Getting 214 non-empty blocks out of 214 blocks
15/08/05 03:04:12 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms
15/08/05 03:04:12 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO HiveMetaStore: No user is added in admin role, since config is empty
15/08/05 03:04:13 INFO SessionState: No Tez session required at this point. hive.execution.engine=mr.
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
15/08/05 03:04:13 INFO HiveContext: Initializing execution hive, version 0.13.1
15/08/05 03:04:13 INFO SparkILoop: Created sql context (with Hive support)..
SQL context available as sqlContext.
{code}

In the end, the job succeeded and produced correct output but it ran many times slower that it should have.

Longer output, showing job completion, can be found [here|https://gist.github.com/ssimeonov/8064f5f77897bb337157]. In the example, sql context was created about 2,500 times. I've had runs where it's been over 100,000 times.

Could it be that an exception silently generated by the job is causing this?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org