You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@reef.apache.org by Saikat Kanjilal <sx...@gmail.com> on 2017/10/10 16:20:26 UTC

The plan for reef-1791

Good morning Reef dev community

I wanted to share some thoughts on how I am thinking we move forward with
the implementation of reef-runtime-spark:



   1. I have completed my first cut of the code based on discussions with
   Sergiy and am ready to test this code and will do so both locally and
   either on hdinsight or on a vm installed with spark and Hadoop running on
   yarn
   2. Testing will take a bit of time  as we need to work out all the bugs
   that come up coordinating events with reef and spark containers
   3. Next week I will be testing this on my mac running spark binaries on
   Hadoop locally
   4.  Towards the end of the month I will transition to testing on AWS ,
   specifically running spark on EMR and reef on that setup, I think running
   REEF on AWS/EMR is a big plus and will enable more users to run spark omn
   REF
   5. I was going to wait to put out a code review till the first
   successful tests go through , to reiterate the goal for the first phase  is
   to simply run HelloReef on spark





If you have any concerns or feedback on this plan  do let me know, as I
mentioned in JIRA I would really like to see us move to Java8 sooner than
latter, it’ll make the development of reef-runtime-spark a lot simpler.



Thanks in advance for your help.

Re: The plan for reef-1791

Posted by Saikat Kanjilal <sx...@gmail.com>.

And finally drumroll please:  PR for first version of reef-runtime-spark
running on 1 node yarn install is here:
https://github.com/apache/reef/pull/1404/

Please review and give detailed feedback, really looking forward to making
this a success with the help of the community.
Regards

On Wed, Oct 25, 2017 at 8:57 PM, Saikat Kanjilal <sx...@gmail.com> wrote:

> A quick update on this week's efforts on reef-1791, here's a successful
> run from running the LineCounter program over a pom file currently existing
> in reef running on a single node hadoop yarn install on my laptop:
>
> @Rogan Carr after I submit the PR we can divy up further efforts for
> testing this including testing it on hdnsight for a variety of different
> programs (other than the simple LineCounter)
>
> Saikats-MacBook-Pro:reef skanjila$ ./bin/run.sh
> org.apache.reef.examples.data.loading.DataLoadingREEFOnSpark
> java -cp /Applications/hadoop-2.7.4/etc/hadoop:/Users/skanjila/
> code/opensource/reef/lang/java/reef-examples/target/
> reef-examples-0.17.0-SNAPSHOT-shaded.jar:/Applications/
> hadoop-2.7.4/share/hadoop/common/*:/Applications/hadoop-
> 2.7.4/share/hadoop/common/lib/*:/Applications/hadoop-2.7.4/
> share/hadoop/yarn/*:/Applications/hadoop-2.7.4/share/hadoop/hdfs/*:/
> Applications/hadoop-2.7.4/share/hadoop/mapreduce/lib/*:/
> Applications/hadoop-2.7.4/share/hadoop/mapreduce/*
> -Djava.util.logging.config.class=org.apache.reef.util.logging.Config
> org.apache.reef.examples.data.loading.DataLoadingREEFOnSpark
> 2017-10-25 20:47:29,132 INFO reef.examples.data.loading.DataLoadingREEFOnSpark.main
> main | Running Data Loading reef on spark demo on the local runtime
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/Users/skanjila/
> code/opensource/reef/lang/java/reef-examples/target/
> reef-examples-0.17.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/Applications/hadoop-2.7.4/share/hadoop/
> common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/
> StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 17/10/25 20:47:30 INFO spark.SparkContext: Running Spark version 2.1.0
> 17/10/25 20:47:30 WARN spark.SparkContext: Support for Scala 2.10 is
> deprecated as of Spark 2.1.0
> 17/10/25 20:47:30 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 17/10/25 20:47:35 INFO spark.SecurityManager: Changing view acls to:
> skanjila
> 17/10/25 20:47:35 INFO spark.SecurityManager: Changing modify acls to:
> skanjila
> 17/10/25 20:47:35 INFO spark.SecurityManager: Changing view acls groups
> to:
> 17/10/25 20:47:35 INFO spark.SecurityManager: Changing modify acls groups
> to:
> 17/10/25 20:47:35 INFO spark.SecurityManager: SecurityManager:
> authentication disabled; ui acls disabled; users  with view permissions:
> Set(skanjila); groups with view permissions: Set(); users  with modify
> permissions: Set(skanjila); groups with modify permissions: Set()
> 17/10/25 20:47:36 INFO util.Utils: Successfully started service
> 'sparkDriver' on port 57896.
> 17/10/25 20:47:36 INFO spark.SparkEnv: Registering MapOutputTracker
> 17/10/25 20:47:36 INFO spark.SparkEnv: Registering BlockManagerMaster
> 17/10/25 20:47:36 INFO storage.BlockManagerMasterEndpoint: Using
> org.apache.spark.storage.DefaultTopologyMapper for getting topology
> information
> 17/10/25 20:47:36 INFO storage.BlockManagerMasterEndpoint:
> BlockManagerMasterEndpoint up
> 17/10/25 20:47:36 INFO storage.DiskBlockManager: Created local directory
> at /private/var/folders/sv/kc2yf26s4qzfb7m36v06028m0000gn
> /T/blockmgr-6fe2e205-ac6f-4e77-b5a2-7748c7e2b272
> 17/10/25 20:47:36 INFO memory.MemoryStore: MemoryStore started with
> capacity 912.3 MB
> 17/10/25 20:47:36 INFO spark.SparkEnv: Registering OutputCommitCoordinator
> 17/10/25 20:47:36 INFO util.log: Logging initialized @7709ms
> 17/10/25 20:47:36 INFO server.Server: jetty-9.2.z-SNAPSHOT
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@6bd51ed8{/jobs,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@61e3a1fd{/jobs/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@51abf713{/jobs/job,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@eadb475{/jobs/job/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@4d4d48a6{/stages,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@315df4bb{/stages/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@3fc08eec{/stages/stage,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@5cad8b7d{/stages/stage/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@7b02e036{/stages/pool,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@25243bc1{/stages/pool/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@1e287667{/storage,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@2e6ee0bc{/storage/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@4201a617{/storage/rdd,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@467f77a5{/storage/rdd/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@1bb9aa43{/environment,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@420bc288{/environment/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@df5f5c0{/executors,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@308a6984{/executors/json,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@66b72664{/executors/
> threadDump,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@7a34b7b8{/executors/threadDump/json,null,
> AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@58cd06cb{/static,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@3be8821f{/,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@64b31700{/api,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@3b65e559{/jobs/job/kill,null,AVAILABLE}
> 17/10/25 20:47:36 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@bae47a0{/stages/stage/kill,null,AVAILABLE}
> 17/10/25 20:47:36 INFO server.ServerConnector: Started
> ServerConnector@27f0ad19{HTTP/1.1}{0.0.0.0:4040}
> 17/10/25 20:47:36 INFO server.Server: Started @7902ms
> 17/10/25 20:47:36 INFO util.Utils: Successfully started service 'SparkUI'
> on port 4040.
> 17/10/25 20:47:36 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started
> at http://192.168.0.169:4040
> 17/10/25 20:47:36 INFO executor.Executor: Starting executor ID driver on
> host localhost
> 17/10/25 20:47:36 INFO util.Utils: Successfully started service
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57897.
> 17/10/25 20:47:36 INFO netty.NettyBlockTransferService: Server created on
> 192.168.0.169:57897
> 17/10/25 20:47:36 INFO storage.BlockManager: Using
> org.apache.spark.storage.RandomBlockReplicationPolicy for block
> replication policy
> 17/10/25 20:47:36 INFO storage.BlockManagerMaster: Registering
> BlockManager BlockManagerId(driver, 192.168.0.169, 57897, None)
> 17/10/25 20:47:36 INFO storage.BlockManagerMasterEndpoint: Registering
> block manager 192.168.0.169:57897 with 912.3 MB RAM,
> BlockManagerId(driver, 192.168.0.169, 57897, None)
> 17/10/25 20:47:36 INFO storage.BlockManagerMaster: Registered BlockManager
> BlockManagerId(driver, 192.168.0.169, 57897, None)
> 17/10/25 20:47:36 INFO storage.BlockManager: Initialized BlockManager:
> BlockManagerId(driver, 192.168.0.169, 57897, None)
> 17/10/25 20:47:37 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@777c350f{/metrics/json,null,AVAILABLE}
> 17/10/25 20:47:37 INFO internal.SharedState: Warehouse path is
> 'file:/Users/skanjila/code/opensource/reef/spark-warehouse/'.
> 17/10/25 20:47:37 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@6d3c232f{/SQL,null,AVAILABLE}
> 17/10/25 20:47:37 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@1bcf67e8{/SQL/json,null,AVAILABLE}
> 17/10/25 20:47:37 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@460b6d54{/SQL/execution,null,AVAILABLE}
> 17/10/25 20:47:37 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@76075d65{/SQL/execution/json,null,AVAILABLE}
> 17/10/25 20:47:37 INFO handler.ContextHandler: Started
> o.s.j.s.ServletContextHandler@ccd1bc3{/static/sql,null,AVAILABLE}
> 17/10/25 20:47:37 INFO memory.MemoryStore: Block broadcast_0 stored as
> values in memory (estimated size 241.4 KB, free 912.1 MB)
> 17/10/25 20:47:37 INFO memory.MemoryStore: Block broadcast_0_piece0 stored
> as bytes in memory (estimated size 23.5 KB, free 912.0 MB)
> 17/10/25 20:47:37 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
> in memory on 192.168.0.169:57897 (size: 23.5 KB, free: 912.3 MB)
> 17/10/25 20:47:37 INFO spark.SparkContext: Created broadcast 0 from
> textFile at SparkRunner.java:63
> 17/10/25 20:47:38 INFO mapred.FileInputFormat: Total input paths to
> process : 1
> 17/10/25 20:47:38 INFO spark.SparkContext: Starting job: count at
> SparkRunner.java:64
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: Got job 0 (count at
> SparkRunner.java:64) with 6 output partitions
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: Final stage: ResultStage 0
> (count at SparkRunner.java:64)
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: Parents of final stage:
> List()
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: Missing parents: List()
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: Submitting ResultStage 0
> (file:///Users/skanjila/code/opensource/reef/pom.xml MapPartitionsRDD[1]
> at textFile at SparkRunner.java:63), which has no missing parents
> 17/10/25 20:47:38 INFO memory.MemoryStore: Block broadcast_1 stored as
> values in memory (estimated size 3.2 KB, free 912.0 MB)
> 17/10/25 20:47:38 INFO memory.MemoryStore: Block broadcast_1_piece0 stored
> as bytes in memory (estimated size 1980.0 B, free 912.0 MB)
> 17/10/25 20:47:38 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
> in memory on 192.168.0.169:57897 (size: 1980.0 B, free: 912.3 MB)
> 17/10/25 20:47:38 INFO spark.SparkContext: Created broadcast 1 from
> broadcast at DAGScheduler.scala:996
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: Submitting 6 missing tasks
> from ResultStage 0 (file:///Users/skanjila/code/opensource/reef/pom.xml
> MapPartitionsRDD[1] at textFile at SparkRunner.java:63)
> 17/10/25 20:47:38 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0
> with 6 tasks
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 0.0 in
> stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL,
> 5913 bytes)
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 1.0 in
> stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL,
> 5913 bytes)
> 17/10/25 20:47:38 INFO executor.Executor: Running task 1.0 in stage 0.0
> (TID 1)
> 17/10/25 20:47:38 INFO executor.Executor: Running task 0.0 in stage 0.0
> (TID 0)
> 17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
> file:/Users/skanjila/code/opensource/reef/pom.xml:0+6610
> 17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
> file:/Users/skanjila/code/opensource/reef/pom.xml:6610+6610
> 17/10/25 20:47:38 INFO Configuration.deprecation: mapred.tip.id is
> deprecated. Instead, use mapreduce.task.id
> 17/10/25 20:47:38 INFO Configuration.deprecation: mapred.task.id is
> deprecated. Instead, use mapreduce.task.attempt.id
> 17/10/25 20:47:38 INFO Configuration.deprecation: mapred.task.is.map is
> deprecated. Instead, use mapreduce.task.ismap
> 17/10/25 20:47:38 INFO Configuration.deprecation: mapred.task.partition is
> deprecated. Instead, use mapreduce.task.partition
> 17/10/25 20:47:38 INFO Configuration.deprecation: mapred.job.id is
> deprecated. Instead, use mapreduce.job.id
> 17/10/25 20:47:38 INFO executor.Executor: Finished task 1.0 in stage 0.0
> (TID 1). 1210 bytes result sent to driver
> 17/10/25 20:47:38 INFO executor.Executor: Finished task 0.0 in stage 0.0
> (TID 0). 1210 bytes result sent to driver
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 2.0 in
> stage 0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL,
> 5913 bytes)
> 17/10/25 20:47:38 INFO executor.Executor: Running task 2.0 in stage 0.0
> (TID 2)
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 3.0 in
> stage 0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL,
> 5913 bytes)
> 17/10/25 20:47:38 INFO executor.Executor: Running task 3.0 in stage 0.0
> (TID 3)
> 17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
> file:/Users/skanjila/code/opensource/reef/pom.xml:13220+6610
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 1.0 in
> stage 0.0 (TID 1) in 172 ms on localhost (executor driver) (1/6)
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 0.0 in
> stage 0.0 (TID 0) in 240 ms on localhost (executor driver) (2/6)
> 17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
> file:/Users/skanjila/code/opensource/reef/pom.xml:19830+6610
> 17/10/25 20:47:38 INFO executor.Executor: Finished task 2.0 in stage 0.0
> (TID 2). 1123 bytes result sent to driver
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 4.0 in
> stage 0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL,
> 5913 bytes)
> 17/10/25 20:47:38 INFO executor.Executor: Running task 4.0 in stage 0.0
> (TID 4)
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 2.0 in
> stage 0.0 (TID 2) in 25 ms on localhost (executor driver) (3/6)
> 17/10/25 20:47:38 INFO executor.Executor: Finished task 3.0 in stage 0.0
> (TID 3). 1123 bytes result sent to driver
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 5.0 in
> stage 0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL,
> 5913 bytes)
> 17/10/25 20:47:38 INFO executor.Executor: Running task 5.0 in stage 0.0
> (TID 5)
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 3.0 in
> stage 0.0 (TID 3) in 25 ms on localhost (executor driver) (4/6)
> 17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
> file:/Users/skanjila/code/opensource/reef/pom.xml:26440+6610
> 17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
> file:/Users/skanjila/code/opensource/reef/pom.xml:33050+6613
> 17/10/25 20:47:38 INFO executor.Executor: Finished task 4.0 in stage 0.0
> (TID 4). 1123 bytes result sent to driver
> 17/10/25 20:47:38 INFO executor.Executor: Finished task 5.0 in stage 0.0
> (TID 5). 1123 bytes result sent to driver
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 4.0 in
> stage 0.0 (TID 4) in 29 ms on localhost (executor driver) (5/6)
> 17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 5.0 in
> stage 0.0 (TID 5) in 26 ms on localhost (executor driver) (6/6)
> 17/10/25 20:47:38 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
> whose tasks have all completed, from pool
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: ResultStage 0 (count at
> SparkRunner.java:64) finished in 0.307 s
> 17/10/25 20:47:38 INFO scheduler.DAGScheduler: Job 0 finished: count at
> SparkRunner.java:64, took 0.481987 s
> 17/10/25 20:47:38 INFO spark.SparkContext: Invoking stop() from shutdown
> hook
> 17/10/25 20:47:38 INFO server.ServerConnector: Stopped
> ServerConnector@27f0ad19{HTTP/1.1}{0.0.0.0:4040}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@bae47a0{/stages/stage/kill,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@3b65e559{/jobs/job/kill,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@64b31700{/api,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@3be8821f{/,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@58cd06cb{/static,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@7a34b7b8{/executors/threadDump/json,null,
> UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@66b72664{/executors/
> threadDump,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@308a6984{/executors/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@df5f5c0{/executors,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@420bc288{/environment/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@1bb9aa43{/environment,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@467f77a5{/storage/rdd/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@4201a617{/storage/rdd,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@2e6ee0bc{/storage/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@1e287667{/storage,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@25243bc1{/stages/pool/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@7b02e036{/stages/pool,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@5cad8b7d{/stages/stage/json,
> null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@3fc08eec{/stages/stage,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@315df4bb{/stages/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@4d4d48a6{/stages,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@eadb475{/jobs/job/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@51abf713{/jobs/job,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@61e3a1fd{/jobs/json,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
> o.s.j.s.ServletContextHandler@6bd51ed8{/jobs,null,UNAVAILABLE}
> 17/10/25 20:47:38 INFO ui.SparkUI: Stopped Spark web UI at
> http://192.168.0.169:4040
> 17/10/25 20:47:38 INFO spark.MapOutputTrackerMasterEndpoint:
> MapOutputTrackerMasterEndpoint stopped!
> 17/10/25 20:47:38 INFO memory.MemoryStore: MemoryStore cleared
> 17/10/25 20:47:38 INFO storage.BlockManager: BlockManager stopped
> 17/10/25 20:47:38 INFO storage.BlockManagerMaster: BlockManagerMaster
> stopped
> 17/10/25 20:47:38 INFO scheduler.OutputCommitCoordinator$
> OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
> 17/10/25 20:47:38 INFO spark.SparkContext: Successfully stopped
> SparkContext
> 17/10/25 20:47:38 INFO util.ShutdownHookManager: Shutdown hook called
> 17/10/25 20:47:38 INFO util.ShutdownHookManager: Deleting directory
> /private/var/folders/sv/kc2yf26s4qzfb7m36v06028m0000gn
> /T/spark-3297daa0-ac98-4b26-a2a5-0305de8efd73
>
>
> It looks from the above that the partitions are getting created
> successfully ,  the pom file for reef is split up between the partitions
> and the LineCounter program is being run successfully on each partition.
>
> Will be moving to doing code cleanup now and then sending a PR hopefully
> soon.  Questions/concerns/thoughts please respond to this thread
>
>
> Thanks
>
> On Sat, Oct 21, 2017 at 12:41 PM, Rogan Carr <ro...@gmail.com> wrote:
>
>> Hi Saikat,
>>
>> Nice work! I'd love to help test this once you have the PR together. I can
>> spin up an Spark/HDInsight cluster on Azure and try it out when it's
>> ready.
>>
>> Best,
>> Rogan
>>
>> On Fri, Oct 20, 2017 at 3:40 PM, Saikat Kanjilal <sx...@gmail.com>
>> wrote:
>>
>> > Hello Folks,
>> > I wanted to give a quick end of the week status on reef-1791, here's
>> what I
>> > have working so far:
>> > * Successfully launching the LineCounter program using the DataLoader
>> > architecture against the spark runtime for a local file running against
>> a 1
>> > node hadoop yarn install
>> > * Successfully invoking the flatmap function and having reef launcher
>> run
>> > inside of that against all the predefined partitions
>> >
>> > ToDo
>> > * Some code cleanup before I submit a PR, no unit tests yet , will add
>> > during the the time we're flushing out the PR
>> > * Documentation around the chosen architecture
>> >
>> > What are the major changes:
>> > * Addition of a new runtime called reef-runtime-spark which invokes the
>> > sparkcontext and launches reef within that cotnext through a simple
>> flatmap
>> > function for now
>> > * Had to change all the Reef Configuration relation classes
>> > (JavaConfigurationBuilderImpl) to implement the Serializable interface
>> as
>> > each closure in spark requires that all the classes passed inside them
>> have
>> > to be serializable, I am wondering about the impact of this (including
>> > performance impact) against the rest of the reef codebase
>> >
>> > Please let me know if there are any questions or additional feedback,
>> look
>> > for the cr hopefully in the next week or so.
>> > Thanks in advance.
>> >
>> >
>> >
>> >
>> >
>> > On Tue, Oct 10, 2017 at 9:20 AM, Saikat Kanjilal <sx...@gmail.com>
>> > wrote:
>> >
>> > > Good morning Reef dev community
>> > >
>> > > I wanted to share some thoughts on how I am thinking we move forward
>> with
>> > > the implementation of reef-runtime-spark:
>> > >
>> > >
>> > >
>> > >    1. I have completed my first cut of the code based on discussions
>> with
>> > >    Sergiy and am ready to test this code and will do so both locally
>> and
>> > >    either on hdinsight or on a vm installed with spark and Hadoop
>> > running on
>> > >    yarn
>> > >    2. Testing will take a bit of time  as we need to work out all the
>> > >    bugs that come up coordinating events with reef and spark
>> containers
>> > >    3. Next week I will be testing this on my mac running spark
>> binaries
>> > >    on Hadoop locally
>> > >    4.  Towards the end of the month I will transition to testing on
>> AWS ,
>> > >    specifically running spark on EMR and reef on that setup, I think
>> > running
>> > >    REEF on AWS/EMR is a big plus and will enable more users to run
>> spark
>> > omn
>> > >    REF
>> > >    5. I was going to wait to put out a code review till the first
>> > >    successful tests go through , to reiterate the goal for the first
>> > phase  is
>> > >    to simply run HelloReef on spark
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > If you have any concerns or feedback on this plan  do let me know, as
>> I
>> > > mentioned in JIRA I would really like to see us move to Java8 sooner
>> than
>> > > latter, it’ll make the development of reef-runtime-spark a lot
>> simpler.
>> > >
>> > >
>> > >
>> > > Thanks in advance for your help.
>> > >
>> >
>>
>
>

Re: The plan for reef-1791

Posted by Saikat Kanjilal <sx...@gmail.com>.

A quick update on this week's efforts on reef-1791, here's a successful run
from running the LineCounter program over a pom file currently existing in
reef running on a single node hadoop yarn install on my laptop:

@Rogan Carr after I submit the PR we can divy up further efforts for
testing this including testing it on hdnsight for a variety of different
programs (other than the simple LineCounter)

Saikats-MacBook-Pro:reef skanjila$ ./bin/run.sh
org.apache.reef.examples.data.loading.DataLoadingREEFOnSpark
java -cp
/Applications/hadoop-2.7.4/etc/hadoop:/Users/skanjila/code/opensource/reef/lang/java/reef-examples/target/reef-examples-0.17.0-SNAPSHOT-shaded.jar:/Applications/hadoop-2.7.4/share/hadoop/common/*:/Applications/hadoop-2.7.4/share/hadoop/common/lib/*:/Applications/hadoop-2.7.4/share/hadoop/yarn/*:/Applications/hadoop-2.7.4/share/hadoop/hdfs/*:/Applications/hadoop-2.7.4/share/hadoop/mapreduce/lib/*:/Applications/hadoop-2.7.4/share/hadoop/mapreduce/*
-Djava.util.logging.config.class=org.apache.reef.util.logging.Config
org.apache.reef.examples.data.loading.DataLoadingREEFOnSpark
2017-10-25 20:47:29,132 INFO
reef.examples.data.loading.DataLoadingREEFOnSpark.main main | Running Data
Loading reef on spark demo on the local runtime
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/Users/skanjila/code/opensource/reef/lang/java/reef-examples/target/reef-examples-0.17.0-SNAPSHOT-shaded.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/Applications/hadoop-2.7.4/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/10/25 20:47:30 INFO spark.SparkContext: Running Spark version 2.1.0
17/10/25 20:47:30 WARN spark.SparkContext: Support for Scala 2.10 is
deprecated as of Spark 2.1.0
17/10/25 20:47:30 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/10/25 20:47:35 INFO spark.SecurityManager: Changing view acls to:
skanjila
17/10/25 20:47:35 INFO spark.SecurityManager: Changing modify acls to:
skanjila
17/10/25 20:47:35 INFO spark.SecurityManager: Changing view acls groups to:
17/10/25 20:47:35 INFO spark.SecurityManager: Changing modify acls groups
to:
17/10/25 20:47:35 INFO spark.SecurityManager: SecurityManager:
authentication disabled; ui acls disabled; users  with view permissions:
Set(skanjila); groups with view permissions: Set(); users  with modify
permissions: Set(skanjila); groups with modify permissions: Set()
17/10/25 20:47:36 INFO util.Utils: Successfully started service
'sparkDriver' on port 57896.
17/10/25 20:47:36 INFO spark.SparkEnv: Registering MapOutputTracker
17/10/25 20:47:36 INFO spark.SparkEnv: Registering BlockManagerMaster
17/10/25 20:47:36 INFO storage.BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology
information
17/10/25 20:47:36 INFO storage.BlockManagerMasterEndpoint:
BlockManagerMasterEndpoint up
17/10/25 20:47:36 INFO storage.DiskBlockManager: Created local directory at
/private/var/folders/sv/kc2yf26s4qzfb7m36v06028m0000gn/T/blockmgr-6fe2e205-ac6f-4e77-b5a2-7748c7e2b272
17/10/25 20:47:36 INFO memory.MemoryStore: MemoryStore started with
capacity 912.3 MB
17/10/25 20:47:36 INFO spark.SparkEnv: Registering OutputCommitCoordinator
17/10/25 20:47:36 INFO util.log: Logging initialized @7709ms
17/10/25 20:47:36 INFO server.Server: jetty-9.2.z-SNAPSHOT
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@6bd51ed8{/jobs,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@61e3a1fd{/jobs/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@51abf713{/jobs/job,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@eadb475{/jobs/job/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@4d4d48a6{/stages,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@315df4bb{/stages/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@3fc08eec{/stages/stage,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@5cad8b7d{/stages/stage/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@7b02e036{/stages/pool,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@25243bc1{/stages/pool/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@1e287667{/storage,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@2e6ee0bc{/storage/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@4201a617{/storage/rdd,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@467f77a5{/storage/rdd/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@1bb9aa43{/environment,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@420bc288{/environment/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@df5f5c0{/executors,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@308a6984{/executors/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@66b72664{/executors/threadDump,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@7a34b7b8
{/executors/threadDump/json,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@58cd06cb{/static,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@3be8821f{/,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@64b31700{/api,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@3b65e559{/jobs/job/kill,null,AVAILABLE}
17/10/25 20:47:36 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@bae47a0{/stages/stage/kill,null,AVAILABLE}
17/10/25 20:47:36 INFO server.ServerConnector: Started
ServerConnector@27f0ad19{HTTP/1.1}{0.0.0.0:4040}
17/10/25 20:47:36 INFO server.Server: Started @7902ms
17/10/25 20:47:36 INFO util.Utils: Successfully started service 'SparkUI'
on port 4040.
17/10/25 20:47:36 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://192.168.0.169:4040
17/10/25 20:47:36 INFO executor.Executor: Starting executor ID driver on
host localhost
17/10/25 20:47:36 INFO util.Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 57897.
17/10/25 20:47:36 INFO netty.NettyBlockTransferService: Server created on
192.168.0.169:57897
17/10/25 20:47:36 INFO storage.BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
17/10/25 20:47:36 INFO storage.BlockManagerMaster: Registering BlockManager
BlockManagerId(driver, 192.168.0.169, 57897, None)
17/10/25 20:47:36 INFO storage.BlockManagerMasterEndpoint: Registering
block manager 192.168.0.169:57897 with 912.3 MB RAM, BlockManagerId(driver,
192.168.0.169, 57897, None)
17/10/25 20:47:36 INFO storage.BlockManagerMaster: Registered BlockManager
BlockManagerId(driver, 192.168.0.169, 57897, None)
17/10/25 20:47:36 INFO storage.BlockManager: Initialized BlockManager:
BlockManagerId(driver, 192.168.0.169, 57897, None)
17/10/25 20:47:37 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@777c350f{/metrics/json,null,AVAILABLE}
17/10/25 20:47:37 INFO internal.SharedState: Warehouse path is
'file:/Users/skanjila/code/opensource/reef/spark-warehouse/'.
17/10/25 20:47:37 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@6d3c232f{/SQL,null,AVAILABLE}
17/10/25 20:47:37 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@1bcf67e8{/SQL/json,null,AVAILABLE}
17/10/25 20:47:37 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@460b6d54{/SQL/execution,null,AVAILABLE}
17/10/25 20:47:37 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@76075d65{/SQL/execution/json,null,AVAILABLE}
17/10/25 20:47:37 INFO handler.ContextHandler: Started
o.s.j.s.ServletContextHandler@ccd1bc3{/static/sql,null,AVAILABLE}
17/10/25 20:47:37 INFO memory.MemoryStore: Block broadcast_0 stored as
values in memory (estimated size 241.4 KB, free 912.1 MB)
17/10/25 20:47:37 INFO memory.MemoryStore: Block broadcast_0_piece0 stored
as bytes in memory (estimated size 23.5 KB, free 912.0 MB)
17/10/25 20:47:37 INFO storage.BlockManagerInfo: Added broadcast_0_piece0
in memory on 192.168.0.169:57897 (size: 23.5 KB, free: 912.3 MB)
17/10/25 20:47:37 INFO spark.SparkContext: Created broadcast 0 from
textFile at SparkRunner.java:63
17/10/25 20:47:38 INFO mapred.FileInputFormat: Total input paths to process
: 1
17/10/25 20:47:38 INFO spark.SparkContext: Starting job: count at
SparkRunner.java:64
17/10/25 20:47:38 INFO scheduler.DAGScheduler: Got job 0 (count at
SparkRunner.java:64) with 6 output partitions
17/10/25 20:47:38 INFO scheduler.DAGScheduler: Final stage: ResultStage 0
(count at SparkRunner.java:64)
17/10/25 20:47:38 INFO scheduler.DAGScheduler: Parents of final stage:
List()
17/10/25 20:47:38 INFO scheduler.DAGScheduler: Missing parents: List()
17/10/25 20:47:38 INFO scheduler.DAGScheduler: Submitting ResultStage 0
(file:///Users/skanjila/code/opensource/reef/pom.xml MapPartitionsRDD[1] at
textFile at SparkRunner.java:63), which has no missing parents
17/10/25 20:47:38 INFO memory.MemoryStore: Block broadcast_1 stored as
values in memory (estimated size 3.2 KB, free 912.0 MB)
17/10/25 20:47:38 INFO memory.MemoryStore: Block broadcast_1_piece0 stored
as bytes in memory (estimated size 1980.0 B, free 912.0 MB)
17/10/25 20:47:38 INFO storage.BlockManagerInfo: Added broadcast_1_piece0
in memory on 192.168.0.169:57897 (size: 1980.0 B, free: 912.3 MB)
17/10/25 20:47:38 INFO spark.SparkContext: Created broadcast 1 from
broadcast at DAGScheduler.scala:996
17/10/25 20:47:38 INFO scheduler.DAGScheduler: Submitting 6 missing tasks
from ResultStage 0 (file:///Users/skanjila/code/opensource/reef/pom.xml
MapPartitionsRDD[1] at textFile at SparkRunner.java:63)
17/10/25 20:47:38 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0
with 6 tasks
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 0.0 in stage
0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 5913
bytes)
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 1.0 in stage
0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 5913
bytes)
17/10/25 20:47:38 INFO executor.Executor: Running task 1.0 in stage 0.0
(TID 1)
17/10/25 20:47:38 INFO executor.Executor: Running task 0.0 in stage 0.0
(TID 0)
17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
file:/Users/skanjila/code/opensource/reef/pom.xml:0+6610
17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
file:/Users/skanjila/code/opensource/reef/pom.xml:6610+6610
17/10/25 20:47:38 INFO Configuration.deprecation: mapred.tip.id is
deprecated. Instead, use mapreduce.task.id
17/10/25 20:47:38 INFO Configuration.deprecation: mapred.task.id is
deprecated. Instead, use mapreduce.task.attempt.id
17/10/25 20:47:38 INFO Configuration.deprecation: mapred.task.is.map is
deprecated. Instead, use mapreduce.task.ismap
17/10/25 20:47:38 INFO Configuration.deprecation: mapred.task.partition is
deprecated. Instead, use mapreduce.task.partition
17/10/25 20:47:38 INFO Configuration.deprecation: mapred.job.id is
deprecated. Instead, use mapreduce.job.id
17/10/25 20:47:38 INFO executor.Executor: Finished task 1.0 in stage 0.0
(TID 1). 1210 bytes result sent to driver
17/10/25 20:47:38 INFO executor.Executor: Finished task 0.0 in stage 0.0
(TID 0). 1210 bytes result sent to driver
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 2.0 in stage
0.0 (TID 2, localhost, executor driver, partition 2, PROCESS_LOCAL, 5913
bytes)
17/10/25 20:47:38 INFO executor.Executor: Running task 2.0 in stage 0.0
(TID 2)
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 3.0 in stage
0.0 (TID 3, localhost, executor driver, partition 3, PROCESS_LOCAL, 5913
bytes)
17/10/25 20:47:38 INFO executor.Executor: Running task 3.0 in stage 0.0
(TID 3)
17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
file:/Users/skanjila/code/opensource/reef/pom.xml:13220+6610
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 1.0 in stage
0.0 (TID 1) in 172 ms on localhost (executor driver) (1/6)
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 0.0 in stage
0.0 (TID 0) in 240 ms on localhost (executor driver) (2/6)
17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
file:/Users/skanjila/code/opensource/reef/pom.xml:19830+6610
17/10/25 20:47:38 INFO executor.Executor: Finished task 2.0 in stage 0.0
(TID 2). 1123 bytes result sent to driver
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 4.0 in stage
0.0 (TID 4, localhost, executor driver, partition 4, PROCESS_LOCAL, 5913
bytes)
17/10/25 20:47:38 INFO executor.Executor: Running task 4.0 in stage 0.0
(TID 4)
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 2.0 in stage
0.0 (TID 2) in 25 ms on localhost (executor driver) (3/6)
17/10/25 20:47:38 INFO executor.Executor: Finished task 3.0 in stage 0.0
(TID 3). 1123 bytes result sent to driver
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Starting task 5.0 in stage
0.0 (TID 5, localhost, executor driver, partition 5, PROCESS_LOCAL, 5913
bytes)
17/10/25 20:47:38 INFO executor.Executor: Running task 5.0 in stage 0.0
(TID 5)
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 3.0 in stage
0.0 (TID 3) in 25 ms on localhost (executor driver) (4/6)
17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
file:/Users/skanjila/code/opensource/reef/pom.xml:26440+6610
17/10/25 20:47:38 INFO rdd.HadoopRDD: Input split:
file:/Users/skanjila/code/opensource/reef/pom.xml:33050+6613
17/10/25 20:47:38 INFO executor.Executor: Finished task 4.0 in stage 0.0
(TID 4). 1123 bytes result sent to driver
17/10/25 20:47:38 INFO executor.Executor: Finished task 5.0 in stage 0.0
(TID 5). 1123 bytes result sent to driver
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 4.0 in stage
0.0 (TID 4) in 29 ms on localhost (executor driver) (5/6)
17/10/25 20:47:38 INFO scheduler.TaskSetManager: Finished task 5.0 in stage
0.0 (TID 5) in 26 ms on localhost (executor driver) (6/6)
17/10/25 20:47:38 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
whose tasks have all completed, from pool
17/10/25 20:47:38 INFO scheduler.DAGScheduler: ResultStage 0 (count at
SparkRunner.java:64) finished in 0.307 s
17/10/25 20:47:38 INFO scheduler.DAGScheduler: Job 0 finished: count at
SparkRunner.java:64, took 0.481987 s
17/10/25 20:47:38 INFO spark.SparkContext: Invoking stop() from shutdown
hook
17/10/25 20:47:38 INFO server.ServerConnector: Stopped
ServerConnector@27f0ad19{HTTP/1.1}{0.0.0.0:4040}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@bae47a0{/stages/stage/kill,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@3b65e559{/jobs/job/kill,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@64b31700{/api,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@3be8821f{/,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@58cd06cb{/static,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@7a34b7b8
{/executors/threadDump/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@66b72664
{/executors/threadDump,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@308a6984{/executors/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@df5f5c0{/executors,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@420bc288{/environment/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@1bb9aa43{/environment,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@467f77a5{/storage/rdd/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@4201a617{/storage/rdd,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@2e6ee0bc{/storage/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@1e287667{/storage,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@25243bc1{/stages/pool/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@7b02e036{/stages/pool,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@5cad8b7d{/stages/stage/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@3fc08eec{/stages/stage,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@315df4bb{/stages/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@4d4d48a6{/stages,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@eadb475{/jobs/job/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@51abf713{/jobs/job,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@61e3a1fd{/jobs/json,null,UNAVAILABLE}
17/10/25 20:47:38 INFO handler.ContextHandler: Stopped
o.s.j.s.ServletContextHandler@6bd51ed8{/jobs,null,UNAVAILABLE}
17/10/25 20:47:38 INFO ui.SparkUI: Stopped Spark web UI at
http://192.168.0.169:4040
17/10/25 20:47:38 INFO spark.MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
17/10/25 20:47:38 INFO memory.MemoryStore: MemoryStore cleared
17/10/25 20:47:38 INFO storage.BlockManager: BlockManager stopped
17/10/25 20:47:38 INFO storage.BlockManagerMaster: BlockManagerMaster
stopped
17/10/25 20:47:38 INFO
scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
17/10/25 20:47:38 INFO spark.SparkContext: Successfully stopped SparkContext
17/10/25 20:47:38 INFO util.ShutdownHookManager: Shutdown hook called
17/10/25 20:47:38 INFO util.ShutdownHookManager: Deleting directory
/private/var/folders/sv/kc2yf26s4qzfb7m36v06028m0000gn/T/spark-3297daa0-ac98-4b26-a2a5-0305de8efd73


It looks from the above that the partitions are getting created
successfully ,  the pom file for reef is split up between the partitions
and the LineCounter program is being run successfully on each partition.

Will be moving to doing code cleanup now and then sending a PR hopefully
soon.  Questions/concerns/thoughts please respond to this thread


Thanks

On Sat, Oct 21, 2017 at 12:41 PM, Rogan Carr <ro...@gmail.com> wrote:

> Hi Saikat,
>
> Nice work! I'd love to help test this once you have the PR together. I can
> spin up an Spark/HDInsight cluster on Azure and try it out when it's ready.
>
> Best,
> Rogan
>
> On Fri, Oct 20, 2017 at 3:40 PM, Saikat Kanjilal <sx...@gmail.com>
> wrote:
>
> > Hello Folks,
> > I wanted to give a quick end of the week status on reef-1791, here's
> what I
> > have working so far:
> > * Successfully launching the LineCounter program using the DataLoader
> > architecture against the spark runtime for a local file running against
> a 1
> > node hadoop yarn install
> > * Successfully invoking the flatmap function and having reef launcher run
> > inside of that against all the predefined partitions
> >
> > ToDo
> > * Some code cleanup before I submit a PR, no unit tests yet , will add
> > during the the time we're flushing out the PR
> > * Documentation around the chosen architecture
> >
> > What are the major changes:
> > * Addition of a new runtime called reef-runtime-spark which invokes the
> > sparkcontext and launches reef within that cotnext through a simple
> flatmap
> > function for now
> > * Had to change all the Reef Configuration relation classes
> > (JavaConfigurationBuilderImpl) to implement the Serializable interface as
> > each closure in spark requires that all the classes passed inside them
> have
> > to be serializable, I am wondering about the impact of this (including
> > performance impact) against the rest of the reef codebase
> >
> > Please let me know if there are any questions or additional feedback,
> look
> > for the cr hopefully in the next week or so.
> > Thanks in advance.
> >
> >
> >
> >
> >
> > On Tue, Oct 10, 2017 at 9:20 AM, Saikat Kanjilal <sx...@gmail.com>
> > wrote:
> >
> > > Good morning Reef dev community
> > >
> > > I wanted to share some thoughts on how I am thinking we move forward
> with
> > > the implementation of reef-runtime-spark:
> > >
> > >
> > >
> > >    1. I have completed my first cut of the code based on discussions
> with
> > >    Sergiy and am ready to test this code and will do so both locally
> and
> > >    either on hdinsight or on a vm installed with spark and Hadoop
> > running on
> > >    yarn
> > >    2. Testing will take a bit of time  as we need to work out all the
> > >    bugs that come up coordinating events with reef and spark containers
> > >    3. Next week I will be testing this on my mac running spark binaries
> > >    on Hadoop locally
> > >    4.  Towards the end of the month I will transition to testing on
> AWS ,
> > >    specifically running spark on EMR and reef on that setup, I think
> > running
> > >    REEF on AWS/EMR is a big plus and will enable more users to run
> spark
> > omn
> > >    REF
> > >    5. I was going to wait to put out a code review till the first
> > >    successful tests go through , to reiterate the goal for the first
> > phase  is
> > >    to simply run HelloReef on spark
> > >
> > >
> > >
> > >
> > >
> > > If you have any concerns or feedback on this plan  do let me know, as I
> > > mentioned in JIRA I would really like to see us move to Java8 sooner
> than
> > > latter, it’ll make the development of reef-runtime-spark a lot simpler.
> > >
> > >
> > >
> > > Thanks in advance for your help.
> > >
> >
>

Re: The plan for reef-1791

Posted by Rogan Carr <ro...@gmail.com>.

Hi Saikat,

Nice work! I'd love to help test this once you have the PR together. I can
spin up an Spark/HDInsight cluster on Azure and try it out when it's ready.

Best,
Rogan

On Fri, Oct 20, 2017 at 3:40 PM, Saikat Kanjilal <sx...@gmail.com> wrote:

> Hello Folks,
> I wanted to give a quick end of the week status on reef-1791, here's what I
> have working so far:
> * Successfully launching the LineCounter program using the DataLoader
> architecture against the spark runtime for a local file running against a 1
> node hadoop yarn install
> * Successfully invoking the flatmap function and having reef launcher run
> inside of that against all the predefined partitions
>
> ToDo
> * Some code cleanup before I submit a PR, no unit tests yet , will add
> during the the time we're flushing out the PR
> * Documentation around the chosen architecture
>
> What are the major changes:
> * Addition of a new runtime called reef-runtime-spark which invokes the
> sparkcontext and launches reef within that cotnext through a simple flatmap
> function for now
> * Had to change all the Reef Configuration relation classes
> (JavaConfigurationBuilderImpl) to implement the Serializable interface as
> each closure in spark requires that all the classes passed inside them have
> to be serializable, I am wondering about the impact of this (including
> performance impact) against the rest of the reef codebase
>
> Please let me know if there are any questions or additional feedback, look
> for the cr hopefully in the next week or so.
> Thanks in advance.
>
>
>
>
>
> On Tue, Oct 10, 2017 at 9:20 AM, Saikat Kanjilal <sx...@gmail.com>
> wrote:
>
> > Good morning Reef dev community
> >
> > I wanted to share some thoughts on how I am thinking we move forward with
> > the implementation of reef-runtime-spark:
> >
> >
> >
> >    1. I have completed my first cut of the code based on discussions with
> >    Sergiy and am ready to test this code and will do so both locally and
> >    either on hdinsight or on a vm installed with spark and Hadoop
> running on
> >    yarn
> >    2. Testing will take a bit of time  as we need to work out all the
> >    bugs that come up coordinating events with reef and spark containers
> >    3. Next week I will be testing this on my mac running spark binaries
> >    on Hadoop locally
> >    4.  Towards the end of the month I will transition to testing on AWS ,
> >    specifically running spark on EMR and reef on that setup, I think
> running
> >    REEF on AWS/EMR is a big plus and will enable more users to run spark
> omn
> >    REF
> >    5. I was going to wait to put out a code review till the first
> >    successful tests go through , to reiterate the goal for the first
> phase  is
> >    to simply run HelloReef on spark
> >
> >
> >
> >
> >
> > If you have any concerns or feedback on this plan  do let me know, as I
> > mentioned in JIRA I would really like to see us move to Java8 sooner than
> > latter, it’ll make the development of reef-runtime-spark a lot simpler.
> >
> >
> >
> > Thanks in advance for your help.
> >
>

Re: The plan for reef-1791

Posted by Saikat Kanjilal <sx...@gmail.com>.

Hello Folks,
I wanted to give a quick end of the week status on reef-1791, here's what I
have working so far:
* Successfully launching the LineCounter program using the DataLoader
architecture against the spark runtime for a local file running against a 1
node hadoop yarn install
* Successfully invoking the flatmap function and having reef launcher run
inside of that against all the predefined partitions

ToDo
* Some code cleanup before I submit a PR, no unit tests yet , will add
during the the time we're flushing out the PR
* Documentation around the chosen architecture

What are the major changes:
* Addition of a new runtime called reef-runtime-spark which invokes the
sparkcontext and launches reef within that cotnext through a simple flatmap
function for now
* Had to change all the Reef Configuration relation classes
(JavaConfigurationBuilderImpl) to implement the Serializable interface as
each closure in spark requires that all the classes passed inside them have
to be serializable, I am wondering about the impact of this (including
performance impact) against the rest of the reef codebase

Please let me know if there are any questions or additional feedback, look
for the cr hopefully in the next week or so.
Thanks in advance.





On Tue, Oct 10, 2017 at 9:20 AM, Saikat Kanjilal <sx...@gmail.com> wrote:

> Good morning Reef dev community
>
> I wanted to share some thoughts on how I am thinking we move forward with
> the implementation of reef-runtime-spark:
>
>
>
>    1. I have completed my first cut of the code based on discussions with
>    Sergiy and am ready to test this code and will do so both locally and
>    either on hdinsight or on a vm installed with spark and Hadoop running on
>    yarn
>    2. Testing will take a bit of time  as we need to work out all the
>    bugs that come up coordinating events with reef and spark containers
>    3. Next week I will be testing this on my mac running spark binaries
>    on Hadoop locally
>    4.  Towards the end of the month I will transition to testing on AWS ,
>    specifically running spark on EMR and reef on that setup, I think running
>    REEF on AWS/EMR is a big plus and will enable more users to run spark omn
>    REF
>    5. I was going to wait to put out a code review till the first
>    successful tests go through , to reiterate the goal for the first phase  is
>    to simply run HelloReef on spark
>
>
>
>
>
> If you have any concerns or feedback on this plan  do let me know, as I
> mentioned in JIRA I would really like to see us move to Java8 sooner than
> latter, it’ll make the development of reef-runtime-spark a lot simpler.
>
>
>
> Thanks in advance for your help.
>