You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@bigtop.apache.org by kw...@apache.org on 2016/12/02 22:47:05 UTC

bigtop git commit: BIGTOP-2561: add juju bundle for hadoop-spark (closes #166)

Repository: bigtop
Updated Branches:
  refs/heads/master 56deaa7c0 -> f4d023b4c


BIGTOP-2561: add juju bundle for hadoop-spark (closes #166)

Signed-off-by: Kevin W Monroe <ke...@canonical.com>


Project: http://git-wip-us.apache.org/repos/asf/bigtop/repo
Commit: http://git-wip-us.apache.org/repos/asf/bigtop/commit/f4d023b4
Tree: http://git-wip-us.apache.org/repos/asf/bigtop/tree/f4d023b4
Diff: http://git-wip-us.apache.org/repos/asf/bigtop/diff/f4d023b4

Branch: refs/heads/master
Commit: f4d023b4c505efbb3c5b52cb0aa7ceb9dc20cc60
Parents: 56deaa7
Author: Kevin W Monroe <ke...@canonical.com>
Authored: Wed Sep 21 20:46:24 2016 -0500
Committer: Kevin W Monroe <ke...@canonical.com>
Committed: Fri Dec 2 16:46:24 2016 -0600

----------------------------------------------------------------------
 bigtop-deploy/juju/hadoop-spark/.gitignore      |   2 +
 bigtop-deploy/juju/hadoop-spark/README.md       | 356 +++++++++++++++++++
 bigtop-deploy/juju/hadoop-spark/bundle-dev.yaml | 138 +++++++
 .../juju/hadoop-spark/bundle-local.yaml         | 138 +++++++
 bigtop-deploy/juju/hadoop-spark/bundle.yaml     | 138 +++++++
 bigtop-deploy/juju/hadoop-spark/copyright       |  16 +
 .../juju/hadoop-spark/tests/01-bundle.py        | 137 +++++++
 .../juju/hadoop-spark/tests/tests.yaml          |   7 +
 8 files changed, 932 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/.gitignore
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/.gitignore b/bigtop-deploy/juju/hadoop-spark/.gitignore
new file mode 100644
index 0000000..a295864
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/.gitignore
@@ -0,0 +1,2 @@
+*.pyc
+__pycache__

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/README.md
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/README.md b/bigtop-deploy/juju/hadoop-spark/README.md
new file mode 100644
index 0000000..b2b936b
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/README.md
@@ -0,0 +1,356 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+# Overview
+
+The Apache Hadoop software library is a framework that allows for the
+distributed processing of large data sets across clusters of computers
+using a simple programming model.
+
+Hadoop is designed to scale from a few servers to thousands of machines,
+each offering local computation and storage. Rather than rely on hardware
+to deliver high-availability, Hadoop can detect and handle failures at the
+application layer. This provides a highly-available service on top of a cluster
+of machines, each of which may be prone to failure.
+
+Spark is a fast and general engine for large-scale data processing.
+
+This bundle provides a complete deployment of Hadoop and Spark components from
+[Apache Bigtop][] that performs distributed data processing at scale. Ganglia
+and rsyslog applications are also provided to monitor cluster health and syslog
+activity.
+
+[Apache Bigtop]: http://bigtop.apache.org/
+
+## Bundle Composition
+
+The applications that comprise this bundle are spread across 9 units as
+follows:
+
+  * NameNode (HDFS)
+  * ResourceManager (YARN)
+    * Colocated on the NameNode unit
+  * Slave (DataNode and NodeManager)
+    * 3 separate units
+  * Spark
+  * Plugin (Facilitates communication with the Hadoop cluster)
+    * Colocated on the Spark unit
+  * Client (Hadoop endpoint)
+    * Colocated on the Spark unit
+  * Zookeeper
+    * 3 separate units
+  * Ganglia (Web interface for monitoring cluster metrics)
+  * Rsyslog (Aggregate cluster syslog events in a single location)
+    * Colocated on the Ganglia unit
+
+Deploying this bundle results in a fully configured Apache Bigtop
+cluster on any supported cloud, which can be scaled to meet workload
+demands.
+
+
+# Deploying
+
+A working Juju installation is assumed to be present. If Juju is not yet set
+up, please follow the [getting-started][] instructions prior to deploying this
+bundle.
+
+> **Note**: This bundle requires hardware resources that may exceed limits
+of Free-tier or Trial accounts on some clouds. To deploy to these
+environments, modify a local copy of [bundle.yaml][] to set
+`services: 'X': num_units: 1` and `machines: 'X': constraints: mem=3G` as
+needed to satisfy account limits.
+
+Deploy this bundle from the Juju charm store with the `juju deploy` command:
+
+    juju deploy hadoop-spark
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, use [juju-quickstart][] with the following syntax: `juju quickstart
+hadoop-spark`.
+
+Alternatively, deploy a locally modified `bundle.yaml` with:
+
+    juju deploy /path/to/bundle.yaml
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, use [juju-quickstart][] with the following syntax: `juju quickstart
+/path/to/bundle.yaml`.
+
+The charms in this bundle can also be built from their source layers in the
+[Bigtop charm repository][].  See the [Bigtop charm README][] for instructions
+on building and deploying these charms locally.
+
+## Network-Restricted Environments
+Charms can be deployed in environments with limited network access. To deploy
+in this environment, configure a Juju model with appropriate proxy and/or
+mirror options. See [Configuring Models][] for more information.
+
+[getting-started]: https://jujucharms.com/docs/stable/getting-started
+[bundle.yaml]: https://github.com/apache/bigtop/blob/master/bigtop-deploy/juju/hadoop-spark/bundle.yaml
+[juju-quickstart]: https://launchpad.net/juju-quickstart
+[Bigtop charm repository]: https://github.com/apache/bigtop/tree/master/bigtop-packages/src/charm
+[Bigtop charm README]: https://github.com/apache/bigtop/blob/master/bigtop-packages/src/charm/README.md
+[Configuring Models]: https://jujucharms.com/docs/stable/models-config
+
+
+# Verifying
+
+## Status
+The applications that make up this bundle provide status messages to indicate
+when they are ready:
+
+    juju status
+
+This is particularly useful when combined with `watch` to track the on-going
+progress of the deployment:
+
+    watch -n 2 juju status
+
+The message for each unit will provide information about that unit's state.
+Once they all indicate that they are ready, perform application smoke tests
+to verify that the bundle is working as expected.
+
+## Smoke Test
+The charms for each core component (namenode, resourcemanager, slave, spark,
+and zookeeper) provide a `smoke-test` action that can be used to verify the
+application is functioning as expected. Note that the 'slave' component runs
+extensive tests provided by Apache Bigtop and may take up to 30 minutes to
+complete. Run the smoke-test actions as follows:
+
+    juju run-action namenode/0 smoke-test
+    juju run-action resourcemanager/0 smoke-test
+    juju run-action slave/0 smoke-test
+    juju run-action spark/0 smoke-test
+    juju run-action zookeeper/0 smoke-test
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action do <application>/0 smoke-test`.
+
+Watch the progress of the smoke test actions with:
+
+    watch -n 2 juju show-action-status
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action status`.
+
+Eventually, all of the actions should settle to `status: completed`.  If
+any report `status: failed`, that application is not working as expected. Get
+more information about a specific smoke test with:
+
+    juju show-action-output <action-id>
+
+> **Note**: The above assumes Juju 2.0 or greater. If using an earlier version
+of Juju, the syntax is `juju action fetch <action-id>`.
+
+## Utilities
+Applications in this bundle include command line and web utilities that
+can be used to verify information about the cluster.
+
+From the command line, show the HDFS dfsadmin report and view the current list
+of YARN NodeManager units with the following:
+
+    juju run --application namenode "su hdfs -c 'hdfs dfsadmin -report'"
+    juju run --application resourcemanager "su yarn -c 'yarn node -list'"
+
+Show the list of Zookeeper nodes with the following:
+
+    juju run --unit zookeeper/0 'echo "ls /" | /usr/lib/zookeeper/bin/zkCli.sh'
+
+To access the HDFS web console, find the `PUBLIC-ADDRESS` of the namenode
+application and expose it:
+
+    juju status namenode
+    juju expose namenode
+
+The web interface will be available at the following URL:
+
+    http://NAMENODE_PUBLIC_IP:50070
+
+Similarly, to access the Resource Manager web consoles, find the
+`PUBLIC-ADDRESS` of the resourcemanager application and expose it:
+
+    juju status resourcemanager
+    juju expose resourcemanager
+
+The YARN and Job History web interfaces will be available at the following URLs:
+
+    http://RESOURCEMANAGER_PUBLIC_IP:8088
+    http://RESOURCEMANAGER_PUBLIC_IP:19888
+
+Finally, to access the Spark web console, find the `PUBLIC-ADDRESS` of the
+spark application and expose it:
+
+    juju status spark
+    juju expose spark
+
+The web interface will be available at the following URL:
+
+    http://SPARK_PUBLIC_IP:8080
+
+
+# Monitoring
+
+This bundle includes Ganglia for system-level monitoring of the namenode,
+resourcemanager, slave, spark, and zookeeper units. Metrics are sent to a
+centralized ganglia unit for easy viewing in a browser. To view the ganglia web
+interface, find the `PUBLIC-ADDRESS` of the Ganglia application and expose it:
+
+    juju status ganglia
+    juju expose ganglia
+
+The web interface will be available at:
+
+    http://GANGLIA_PUBLIC_IP/ganglia
+
+
+# Logging
+
+This bundle includes rsyslog to collect syslog data from the namenode,
+resourcemanager, slave, spark, and zookeeper units. These logs are sent to a
+centralized rsyslog unit for easy syslog analysis. One method of viewing this
+log data is to simply cat syslog from the rsyslog unit:
+
+    juju run --unit rsyslog/0 'sudo cat /var/log/syslog'
+
+Logs may also be forwarded to an external rsyslog processing service. See
+the *Forwarding logs to a system outside of the Juju environment* section of
+the [rsyslog README](https://jujucharms.com/rsyslog/) for more information.
+
+
+# Benchmarking
+
+The `resourcemanager` charm in this bundle provide several benchmarks to gauge
+the performance of the Hadoop cluster. Each benchmark is an action that can be
+run with `juju run-action`:
+
+    $ juju actions resourcemanager
+    ACTION      DESCRIPTION
+    mrbench     Mapreduce benchmark for small jobs
+    nnbench     Load test the NameNode hardware and configuration
+    smoke-test  Run an Apache Bigtop smoke test.
+    teragen     Generate data with teragen
+    terasort    Runs teragen to generate sample data, and then runs terasort to sort that data
+    testdfsio   DFS IO Testing
+
+    $ juju run-action resourcemanager/0 nnbench
+    Action queued with id: 55887b40-116c-4020-8b35-1e28a54cc622
+
+    $ juju show-action-output 55887b40-116c-4020-8b35-1e28a54cc622
+    results:
+      meta:
+        composite:
+          direction: asc
+          units: secs
+          value: "128"
+        start: 2016-02-04T14:55:39Z
+        stop: 2016-02-04T14:57:47Z
+      results:
+        raw: '{"BAD_ID": "0", "FILE: Number of read operations": "0", "Reduce input groups":
+          "8", "Reduce input records": "95", "Map output bytes": "1823", "Map input records":
+          "12", "Combine input records": "0", "HDFS: Number of bytes read": "18635", "FILE:
+          Number of bytes written": "32999982", "HDFS: Number of write operations": "330",
+          "Combine output records": "0", "Total committed heap usage (bytes)": "3144749056",
+          "Bytes Written": "164", "WRONG_LENGTH": "0", "Failed Shuffles": "0", "FILE:
+          Number of bytes read": "27879457", "WRONG_MAP": "0", "Spilled Records": "190",
+          "Merged Map outputs": "72", "HDFS: Number of large read operations": "0", "Reduce
+          shuffle bytes": "2445", "FILE: Number of large read operations": "0", "Map output
+          materialized bytes": "2445", "IO_ERROR": "0", "CONNECTION": "0", "HDFS: Number
+          of read operations": "567", "Map output records": "95", "Reduce output records":
+          "8", "WRONG_REDUCE": "0", "HDFS: Number of bytes written": "27412", "GC time
+          elapsed (ms)": "603", "Input split bytes": "1610", "Shuffled Maps ": "72", "FILE:
+          Number of write operations": "0", "Bytes Read": "1490"}'
+    status: completed
+    timing:
+      completed: 2016-02-04 14:57:48 +0000 UTC
+      enqueued: 2016-02-04 14:55:14 +0000 UTC
+      started: 2016-02-04 14:55:27 +0000 UTC
+
+The `spark` charm in this bundle also provides several benchmarks to gauge
+the performance of the Spark cluster. Each benchmark is an action that can be
+run with `juju run-action`:
+
+    $ juju actions spark | grep Bench
+    connectedcomponent                Run the Spark Bench ConnectedComponent benchmark.
+    decisiontree                      Run the Spark Bench DecisionTree benchmark.
+    kmeans                            Run the Spark Bench KMeans benchmark.
+    linearregression                  Run the Spark Bench LinearRegression benchmark.
+    logisticregression                Run the Spark Bench LogisticRegression benchmark.
+    matrixfactorization               Run the Spark Bench MatrixFactorization benchmark.
+    pagerank                          Run the Spark Bench PageRank benchmark.
+    pca                               Run the Spark Bench PCA benchmark.
+    pregeloperation                   Run the Spark Bench PregelOperation benchmark.
+    shortestpaths                     Run the Spark Bench ShortestPaths benchmark.
+    sql                               Run the Spark Bench SQL benchmark.
+    stronglyconnectedcomponent        Run the Spark Bench StronglyConnectedComponent benchmark.
+    svdplusplus                       Run the Spark Bench SVDPlusPlus benchmark.
+    svm                               Run the Spark Bench SVM benchmark.
+
+    $ juju run-action spark/0 svdplusplus
+    Action queued with id: 339cec1f-e903-4ee7-85ca-876fb0c3d28e
+
+    $ juju show-action-output 339cec1f-e903-4ee7-85ca-876fb0c3d28e
+    results:
+      meta:
+        composite:
+          direction: asc
+          units: secs
+          value: "200.754000"
+        raw: |
+          SVDPlusPlus,2016-11-02-03:08:26,200.754000,85.974071,.428255,0,SVDPlusPlus-MLlibConfig,,,,,10,,,50000,4.0,1.3,
+        start: 2016-11-02T03:08:26Z
+        stop: 2016-11-02T03:11:47Z
+      results:
+        duration:
+          direction: asc
+          units: secs
+          value: "200.754000"
+        throughput:
+          direction: desc
+          units: MB/sec
+          value: ".428255"
+    status: completed
+    timing:
+      completed: 2016-11-02 03:11:48 +0000 UTC
+      enqueued: 2016-11-02 03:08:21 +0000 UTC
+      started: 2016-11-02 03:08:26 +0000 UTC
+
+
+# Scaling
+
+By default, three Hadoop slave and three zookeeper units are deployed. Scaling
+these applications is as simple as adding more units. To add one unit:
+
+    juju add-unit slave
+    juju add-unit zookeeper
+
+Multiple units may be added at once.  For example, add four more slave units:
+
+    juju add-unit -n4 slave
+
+
+# Contact Information
+
+- <bi...@lists.ubuntu.com>
+
+
+# Resources
+
+- [Apache Bigtop](http://bigtop.apache.org/) home page
+- [Apache Bigtop issue tracking](http://bigtop.apache.org/issue-tracking.html)
+- [Apache Bigtop mailing lists](http://bigtop.apache.org/mail-lists.html)
+- [Juju Bigtop charms](https://jujucharms.com/q/apache/bigtop)
+- [Juju mailing list](https://lists.ubuntu.com/mailman/listinfo/juju)
+- [Juju community](https://jujucharms.com/community)

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/bundle-dev.yaml
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/bundle-dev.yaml b/bigtop-deploy/juju/hadoop-spark/bundle-dev.yaml
new file mode 100644
index 0000000..35623fd
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/bundle-dev.yaml
@@ -0,0 +1,138 @@
+services:
+  namenode:
+    charm: "cs:~bigdata-dev/xenial/hadoop-namenode"
+    num_units: 1
+    annotations:
+      gui-x: "500"
+      gui-y: "800"
+    to:
+      - "0"
+  resourcemanager:
+    charm: "cs:~bigdata-dev/xenial/hadoop-resourcemanager"
+    num_units: 1
+    annotations:
+      gui-x: "500"
+      gui-y: "0"
+    to:
+      - "0"
+  slave:
+    charm: "cs:~bigdata-dev/xenial/hadoop-slave"
+    num_units: 3
+    annotations:
+      gui-x: "0"
+      gui-y: "400"
+    to:
+      - "1"
+      - "2"
+      - "3"
+  plugin:
+    charm: "cs:~bigdata-dev/xenial/hadoop-plugin"
+    annotations:
+      gui-x: "1000"
+      gui-y: "400"
+  client:
+    charm: "cs:xenial/hadoop-client-2"
+    num_units: 1
+    annotations:
+      gui-x: "1250"
+      gui-y: "400"
+    to:
+      - "4"
+  spark:
+    charm: "cs:~bigdata-dev/xenial/spark"
+    num_units: 1
+    options:
+      spark_execution_mode: "yarn-client"
+    annotations:
+      gui-x: "1000"
+      gui-y: "0"
+    to:
+      - "4"
+  zookeeper:
+    charm: "cs:xenial/zookeeper-10"
+    num_units: 3
+    annotations:
+      gui-x: "500"
+      gui-y: "400"
+    to:
+      - "5"
+      - "6"
+      - "7"
+  ganglia:
+    charm: "cs:~bigdata-dev/xenial/ganglia-5"
+    num_units: 1
+    annotations:
+      gui-x: "0"
+      gui-y: "800"
+    to:
+      - "8"
+  ganglia-node:
+    charm: "cs:~bigdata-dev/xenial/ganglia-node-6"
+    annotations:
+      gui-x: "250"
+      gui-y: "400"
+  rsyslog:
+    charm: "cs:~bigdata-dev/xenial/rsyslog-6"
+    num_units: 1
+    annotations:
+      gui-x: "1000"
+      gui-y: "800"
+    to:
+      - "8"
+  rsyslog-forwarder-ha:
+    charm: "cs:~bigdata-dev/xenial/rsyslog-forwarder-ha-7"
+    annotations:
+      gui-x: "750"
+      gui-y: "400"
+series: xenial
+relations:
+  - [resourcemanager, namenode]
+  - [namenode, slave]
+  - [resourcemanager, slave]
+  - [plugin, namenode]
+  - [plugin, resourcemanager]
+  - [client, plugin]
+  - [spark, plugin]
+  - [spark, zookeeper]
+  - ["ganglia-node:juju-info", "client:juju-info"]
+  - ["ganglia-node:juju-info", "namenode:juju-info"]
+  - ["ganglia-node:juju-info", "resourcemanager:juju-info"]
+  - ["ganglia-node:juju-info", "slave:juju-info"]
+  - ["ganglia-node:juju-info", "spark:juju-info"]
+  - ["ganglia-node:juju-info", "zookeeper:juju-info"]
+  - ["ganglia:node", "ganglia-node:node"]
+  - ["rsyslog-forwarder-ha:juju-info", "client:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "namenode:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "resourcemanager:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "slave:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "spark:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "zookeeper:juju-info"]
+  - ["rsyslog:aggregator", "rsyslog-forwarder-ha:syslog"]
+machines:
+  "0":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "1":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "2":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "3":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "4":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "5":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "6":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "7":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "8":
+    constraints: "mem=3G"
+    series: "xenial"

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/bundle-local.yaml
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/bundle-local.yaml b/bigtop-deploy/juju/hadoop-spark/bundle-local.yaml
new file mode 100644
index 0000000..160683a
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/bundle-local.yaml
@@ -0,0 +1,138 @@
+services:
+  namenode:
+    charm: "/home/ubuntu/charms/xenial/hadoop-namenode"
+    num_units: 1
+    annotations:
+      gui-x: "500"
+      gui-y: "800"
+    to:
+      - "0"
+  resourcemanager:
+    charm: "/home/ubuntu/charms/xenial/hadoop-resourcemanager"
+    num_units: 1
+    annotations:
+      gui-x: "500"
+      gui-y: "0"
+    to:
+      - "0"
+  slave:
+    charm: "/home/ubuntu/charms/xenial/hadoop-slave"
+    num_units: 3
+    annotations:
+      gui-x: "0"
+      gui-y: "400"
+    to:
+      - "1"
+      - "2"
+      - "3"
+  plugin:
+    charm: "/home/ubuntu/charms/xenial/hadoop-plugin"
+    annotations:
+      gui-x: "1000"
+      gui-y: "400"
+  client:
+    charm: "cs:xenial/hadoop-client-2"
+    num_units: 1
+    annotations:
+      gui-x: "1250"
+      gui-y: "400"
+    to:
+      - "4"
+  spark:
+    charm: "/home/ubuntu/charms/xenial/spark"
+    num_units: 1
+    options:
+      spark_execution_mode: "yarn-client"
+    annotations:
+      gui-x: "1000"
+      gui-y: "0"
+    to:
+      - "4"
+  zookeeper:
+    charm: "cs:xenial/zookeeper-10"
+    num_units: 3
+    annotations:
+      gui-x: "500"
+      gui-y: "400"
+    to:
+      - "5"
+      - "6"
+      - "7"
+  ganglia:
+    charm: "cs:~bigdata-dev/xenial/ganglia-5"
+    num_units: 1
+    annotations:
+      gui-x: "0"
+      gui-y: "800"
+    to:
+      - "8"
+  ganglia-node:
+    charm: "cs:~bigdata-dev/xenial/ganglia-node-6"
+    annotations:
+      gui-x: "250"
+      gui-y: "400"
+  rsyslog:
+    charm: "cs:~bigdata-dev/xenial/rsyslog-6"
+    num_units: 1
+    annotations:
+      gui-x: "1000"
+      gui-y: "800"
+    to:
+      - "8"
+  rsyslog-forwarder-ha:
+    charm: "cs:~bigdata-dev/xenial/rsyslog-forwarder-ha-7"
+    annotations:
+      gui-x: "750"
+      gui-y: "400"
+series: xenial
+relations:
+  - [resourcemanager, namenode]
+  - [namenode, slave]
+  - [resourcemanager, slave]
+  - [plugin, namenode]
+  - [plugin, resourcemanager]
+  - [client, plugin]
+  - [spark, plugin]
+  - [spark, zookeeper]
+  - ["ganglia-node:juju-info", "client:juju-info"]
+  - ["ganglia-node:juju-info", "namenode:juju-info"]
+  - ["ganglia-node:juju-info", "resourcemanager:juju-info"]
+  - ["ganglia-node:juju-info", "slave:juju-info"]
+  - ["ganglia-node:juju-info", "spark:juju-info"]
+  - ["ganglia-node:juju-info", "zookeeper:juju-info"]
+  - ["ganglia:node", "ganglia-node:node"]
+  - ["rsyslog-forwarder-ha:juju-info", "client:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "namenode:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "resourcemanager:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "slave:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "spark:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "zookeeper:juju-info"]
+  - ["rsyslog:aggregator", "rsyslog-forwarder-ha:syslog"]
+machines:
+  "0":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "1":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "2":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "3":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "4":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "5":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "6":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "7":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "8":
+    constraints: "mem=3G"
+    series: "xenial"

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/bundle.yaml
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/bundle.yaml b/bigtop-deploy/juju/hadoop-spark/bundle.yaml
new file mode 100644
index 0000000..67b9bb7
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/bundle.yaml
@@ -0,0 +1,138 @@
+services:
+  namenode:
+    charm: "cs:xenial/hadoop-namenode-6"
+    num_units: 1
+    annotations:
+      gui-x: "500"
+      gui-y: "800"
+    to:
+      - "0"
+  resourcemanager:
+    charm: "cs:xenial/hadoop-resourcemanager-6"
+    num_units: 1
+    annotations:
+      gui-x: "500"
+      gui-y: "0"
+    to:
+      - "0"
+  slave:
+    charm: "cs:xenial/hadoop-slave-6"
+    num_units: 3
+    annotations:
+      gui-x: "0"
+      gui-y: "400"
+    to:
+      - "1"
+      - "2"
+      - "3"
+  plugin:
+    charm: "cs:xenial/hadoop-plugin-6"
+    annotations:
+      gui-x: "1000"
+      gui-y: "400"
+  client:
+    charm: "cs:xenial/hadoop-client-2"
+    num_units: 1
+    annotations:
+      gui-x: "1250"
+      gui-y: "400"
+    to:
+      - "4"
+  spark:
+    charm: "cs:xenial/spark-15"
+    num_units: 1
+    options:
+      spark_execution_mode: "yarn-client"
+    annotations:
+      gui-x: "1000"
+      gui-y: "0"
+    to:
+      - "4"
+  zookeeper:
+    charm: "cs:xenial/zookeeper-10"
+    num_units: 3
+    annotations:
+      gui-x: "500"
+      gui-y: "400"
+    to:
+      - "5"
+      - "6"
+      - "7"
+  ganglia:
+    charm: "cs:~bigdata-dev/xenial/ganglia-5"
+    num_units: 1
+    annotations:
+      gui-x: "0"
+      gui-y: "800"
+    to:
+      - "8"
+  ganglia-node:
+    charm: "cs:~bigdata-dev/xenial/ganglia-node-6"
+    annotations:
+      gui-x: "250"
+      gui-y: "400"
+  rsyslog:
+    charm: "cs:~bigdata-dev/xenial/rsyslog-6"
+    num_units: 1
+    annotations:
+      gui-x: "1000"
+      gui-y: "800"
+    to:
+      - "8"
+  rsyslog-forwarder-ha:
+    charm: "cs:~bigdata-dev/xenial/rsyslog-forwarder-ha-7"
+    annotations:
+      gui-x: "750"
+      gui-y: "400"
+series: xenial
+relations:
+  - [resourcemanager, namenode]
+  - [namenode, slave]
+  - [resourcemanager, slave]
+  - [plugin, namenode]
+  - [plugin, resourcemanager]
+  - [client, plugin]
+  - [spark, plugin]
+  - [spark, zookeeper]
+  - ["ganglia-node:juju-info", "client:juju-info"]
+  - ["ganglia-node:juju-info", "namenode:juju-info"]
+  - ["ganglia-node:juju-info", "resourcemanager:juju-info"]
+  - ["ganglia-node:juju-info", "slave:juju-info"]
+  - ["ganglia-node:juju-info", "spark:juju-info"]
+  - ["ganglia-node:juju-info", "zookeeper:juju-info"]
+  - ["ganglia:node", "ganglia-node:node"]
+  - ["rsyslog-forwarder-ha:juju-info", "client:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "namenode:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "resourcemanager:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "slave:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "spark:juju-info"]
+  - ["rsyslog-forwarder-ha:juju-info", "zookeeper:juju-info"]
+  - ["rsyslog:aggregator", "rsyslog-forwarder-ha:syslog"]
+machines:
+  "0":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "1":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "2":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "3":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "4":
+    constraints: "mem=7G root-disk=32G"
+    series: "xenial"
+  "5":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "6":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "7":
+    constraints: "mem=3G root-disk=32G"
+    series: "xenial"
+  "8":
+    constraints: "mem=3G"
+    series: "xenial"

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/copyright
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/copyright b/bigtop-deploy/juju/hadoop-spark/copyright
new file mode 100644
index 0000000..e900b97
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/copyright
@@ -0,0 +1,16 @@
+Format: http://dep.debian.net/deps/dep5/
+
+Files: *
+Copyright: Copyright 2015, Canonical Ltd., All Rights Reserved.
+License: Apache License 2.0
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+ .
+     http://www.apache.org/licenses/LICENSE-2.0
+ .
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/tests/01-bundle.py
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/tests/01-bundle.py b/bigtop-deploy/juju/hadoop-spark/tests/01-bundle.py
new file mode 100755
index 0000000..ba292bc
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/tests/01-bundle.py
@@ -0,0 +1,137 @@
+#!/usr/bin/env python3
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import amulet
+import os
+import re
+import unittest
+import yaml
+
+
+class TestBundle(unittest.TestCase):
+    bundle_file = os.path.join(os.path.dirname(__file__), '..', 'bundle.yaml')
+
+    @classmethod
+    def setUpClass(cls):
+        # classmethod inheritance doesn't work quite right with
+        # setUpClass / tearDownClass, so subclasses have to manually call this
+        cls.d = amulet.Deployment(series='xenial')
+        with open(cls.bundle_file) as f:
+            bun = f.read()
+        bundle = yaml.safe_load(bun)
+
+        # NB: strip machine ('to') placement out. amulet loses our machine spec
+        # somewhere between yaml and json; without that spec, charms specifying
+        # machine placement will not deploy. This is ok for now because all
+        # charms in this bundle are using 'reset: false' so we'll already
+        # have our deployment just the way we want it by the time this test
+        # runs. However, it's bad. Remove once this is fixed:
+        #  https://github.com/juju/amulet/issues/148
+        for service, service_config in bundle['services'].items():
+            if 'to' in service_config:
+                del service_config['to']
+
+        cls.d.load(bundle)
+        cls.d.setup(timeout=3600)
+
+        # we need units reporting ready before we attempt our smoke tests
+        cls.d.sentry.wait_for_messages({'client': re.compile('ready'),
+                                        'namenode': re.compile('ready'),
+                                        'resourcemanager': re.compile('ready'),
+                                        'slave': re.compile('ready'),
+                                        'spark': re.compile('ready'),
+                                        }, timeout=3600)
+        cls.hdfs = cls.d.sentry['namenode'][0]
+        cls.yarn = cls.d.sentry['resourcemanager'][0]
+        cls.slave = cls.d.sentry['slave'][0]
+        cls.spark = cls.d.sentry['spark'][0]
+
+    def test_components(self):
+        """
+        Confirm that all of the required components are up and running.
+        """
+        hdfs, retcode = self.hdfs.run("pgrep -a java")
+        yarn, retcode = self.yarn.run("pgrep -a java")
+        slave, retcode = self.slave.run("pgrep -a java")
+        spark, retcode = self.spark.run("pgrep -a java")
+
+        assert 'NameNode' in hdfs, "NameNode not started"
+        assert 'NameNode' not in slave, "NameNode should not be running on slave"
+
+        assert 'ResourceManager' in yarn, "ResourceManager not started"
+        assert 'ResourceManager' not in slave, "ResourceManager should not be running on slave"
+
+        assert 'JobHistoryServer' in yarn, "JobHistoryServer not started"
+        assert 'JobHistoryServer' not in slave, "JobHistoryServer should not be running on slave"
+
+        assert 'NodeManager' in slave, "NodeManager not started"
+        assert 'NodeManager' not in yarn, "NodeManager should not be running on resourcemanager"
+        assert 'NodeManager' not in hdfs, "NodeManager should not be running on namenode"
+
+        assert 'DataNode' in slave, "DataNode not started"
+        assert 'DataNode' not in yarn, "DataNode should not be running on resourcemanager"
+        assert 'DataNode' not in hdfs, "DataNode should not be running on namenode"
+
+        assert 'Master' in spark, "Spark Master not started"
+
+    def test_hdfs(self):
+        """
+        Validates mkdir, ls, chmod, and rm HDFS operations.
+        """
+        uuid = self.hdfs.run_action('smoke-test')
+        result = self.d.action_fetch(uuid, timeout=600, full_output=True)
+        # action status=completed on success
+        if (result['status'] != "completed"):
+            self.fail('HDFS smoke-test did not complete: %s' % result)
+
+    def test_yarn(self):
+        """
+        Validates YARN using the Bigtop 'yarn' smoke test.
+        """
+        uuid = self.yarn.run_action('smoke-test')
+        # 'yarn' smoke takes a while (bigtop tests download lots of stuff)
+        result = self.d.action_fetch(uuid, timeout=1800, full_output=True)
+        # action status=completed on success
+        if (result['status'] != "completed"):
+            self.fail('YARN smoke-test did not complete: %s' % result)
+
+    def test_spark(self):
+        """
+        Validates Spark with a simple sparkpi test.
+        """
+        uuid = self.spark.run_action('smoke-test')
+        result = self.d.action_fetch(uuid, timeout=600, full_output=True)
+        # action status=completed on success
+        if (result['status'] != "completed"):
+            self.fail('Spark smoke-test did not complete: %s' % result)
+
+    @unittest.skip(
+        'Skipping slave smoke tests; they are too inconsistent and long running for CWR.')
+    def test_slave(self):
+        """
+        Validates slave using the Bigtop 'hdfs' and 'mapred' smoke test.
+        """
+        uuid = self.slave.run_action('smoke-test')
+        # 'hdfs+mapred' smoke takes a long while (bigtop tests are slow)
+        result = self.d.action_fetch(uuid, timeout=3600, full_output=True)
+        # action status=completed on success
+        if (result['status'] != "completed"):
+            self.fail('Slave smoke-test did not complete: %s' % result)
+
+
+if __name__ == '__main__':
+    unittest.main()

http://git-wip-us.apache.org/repos/asf/bigtop/blob/f4d023b4/bigtop-deploy/juju/hadoop-spark/tests/tests.yaml
----------------------------------------------------------------------
diff --git a/bigtop-deploy/juju/hadoop-spark/tests/tests.yaml b/bigtop-deploy/juju/hadoop-spark/tests/tests.yaml
new file mode 100644
index 0000000..c9325b0
--- /dev/null
+++ b/bigtop-deploy/juju/hadoop-spark/tests/tests.yaml
@@ -0,0 +1,7 @@
+reset: false
+deployment_timeout: 7200
+sources:
+  - 'ppa:juju/stable'
+packages:
+  - amulet
+  - python3-yaml