You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sdap.apache.org by GitBox <gi...@apache.org> on 2022/09/30 01:23:52 UTC

[GitHub] [incubator-sdap-nexus] skorper commented on a diff in pull request #185: SDAP-399: Quickstart update - Docker

skorper commented on code in PR #185:
URL: https://github.com/apache/incubator-sdap-nexus/pull/185#discussion_r984133683


##########
docs/quickstart.rst:
##########
@@ -18,43 +18,59 @@ This quickstart guide will walk you through how to install and run NEXUS on your
 Prerequisites
 ==============
 
-* Docker (tested on v18.03.1-ce)
+* Docker (tested on v20.10.17)
 * Internet Connection
-* bash
+* bash or zsh
 * cURL
-* 500 MB of disk space
+* 10.5 GB of disk space
 
 Prepare
 ========
 
-Start downloading the Docker images and data files.
+Start downloading the Docker images and set up the Docker bridge network.
 
 .. _quickstart-step1:
 
 Pull Docker Images
 -------------------
 
-Pull the necessary Docker images from the `SDAP repository <https://hub.docker.com/u/sdap>`_ on Docker Hub. Please check the repository for the latest version tag.
+Pull the necessary Docker images from the `NEXUS JPL repository <https://hub.docker.com/u/nexusjpl>`_ on Docker Hub. Please check the repository for the latest version tag.
 
 .. code-block:: bash
 
-  export VERSION=1.0.0-rc1
+  export CASSANDRA_VERSION=3.11.6-debian-10-r138
+  export RMQ_VERSION=3.8.9-debian-10-r37
+  export COLLECTION_MANAGER_VERSION=0.1.6a14
+  export GRANULE_INGESTER_VERSION=0.1.6a30
+  export WEBAPP_VERSION=distributed.0.4.5a49
+  export SOLR_VERSION=8.11.1
+  export SOLR_CLOUD_INIT_VERSION=1.0.2
+  export ZK_VERSION=3.5.5
+
+  export JUPYTER_VERSION=1.2
 
 .. code-block:: bash
 
-  docker pull sdap/ningester:${VERSION}
-  docker pull sdap/solr-singlenode:${VERSION}
-  docker pull sdap/cassandra:${VERSION}
-  docker pull sdap/nexus-webapp:standalone.${VERSION}
+  docker pull bitnami/cassandra:${CASSANDRA_VERSION}
+  docker pull bitnami/rabbitmq:${RMQ_VERSION}
+  docker pull nexusjpl/collection-manager:${COLLECTION_MANAGER_VERSION}
+  docker pull nexusjpl/granule-ingester:${GRANULE_INGESTER_VERSION}
+  docker pull nexusjpl/nexus-webapp:${WEBAPP_VERSION}
+  docker pull nexusjpl/solr:${SOLR_VERSION}
+  docker pull nexusjpl/solr-cloud-init:${SOLR_CLOUD_INIT_VERSION}
+  docker pull zookeeper:${ZK_VERSION}
+
+  # docker pull nexusjpl/jupyter:${JUPYTER_VERSION}

Review Comment:
   let's update this back to nexusjpl before we merge this



##########
CHANGELOG.md:
##########
@@ -41,9 +41,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Fixed issue where satellite to satellite matchups with the same dataset don't return the expected result
 - Fixed CSV and NetCDF matchup output bug
 - Fixed NetCDF output switching latitude and longitude
+- SDAP-399: Updated quickstart guide for standalone docker deployment of SDAP.
+- SDAP-399: Updated quickstart Jupyter notebook
 - Fixed import error causing `/timeSeriesSpark` queries to fail.
 - Fixed bug where domsresults no longer worked after successful matchup
 - Fixed certificate error in Dockerfile
-### Security

Review Comment:
   you can leave this section in empty in case we make a security change



##########
docker/jupyter/Dockerfile:
##########
@@ -29,7 +29,8 @@ ENV CHOWN_HOME_OPTS='-R'
 ENV REBUILD_CODE=true
 
 ARG APACHE_NEXUS=https://github.com/apache/incubator-sdap-nexus.git
-ARG APACHE_NEXUS_BRANCH=master
+ARG APACHE_NEXUS_COMMIT=be19c1d567301b09269e851cc5b5af55fea02c5d

Review Comment:
   This commit is from 2019... ideally, I think we'd like this to run on the latest code? What issues did you run when running this on master?



##########
docker/jupyter/requirements.txt:
##########
@@ -1,4 +1,3 @@
-shapely
-requests
-numpy
-cassandra-driver==3.9.0

Review Comment:
   I'm curious why we were able to remove this? Or why it was needed before but not now?



##########
docs/quickstart.rst:
##########
@@ -64,181 +80,240 @@ The network we will be using for this quickstart will be called ``sdap-net``. Cr
 
 .. _quickstart-step3:
 
-Download Sample Data
----------------------
+Start Ingester Components and Ingest Some Science Data
+========================================================
 
-The data we will be downloading is part of the `AVHRR OI dataset <https://podaac.jpl.nasa.gov/dataset/AVHRR_OI-NCEI-L4-GLOB-v2.0>`_ which measures sea surface temperature. We will download 1 month of data and ingest it into a local Solr and Cassandra instance.
+Create Data Directory
+------------------------
+
+Let's start by creating the directory to hold the science data to ingest.
 
 Choose a location that is mountable by Docker (typically needs to be under the User's home directory) to download the data files to.
 
 .. code-block:: bash
 
-  export DATA_DIRECTORY=~/nexus-quickstart/data/avhrr-granules
-  mkdir -p ${DATA_DIRECTORY}
+    export DATA_DIRECTORY=~/nexus-quickstart/data/avhrr-granules
+    mkdir -p ${DATA_DIRECTORY}
 
-Then go ahead and download 1 month worth of AVHRR netCDF files.
+Now we can start up the data storage components. We will be using Solr and Cassandra to store the tile metadata and data respectively.
 
-.. code-block:: bash
+.. _quickstart-step4:
 
-  cd $DATA_DIRECTORY
+Start Zookeeper
+---------------
 
-  export URL_LIST="https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/305/20151101120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/306/20151102120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/307/20151103120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/308/20151104120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/309/20151105120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/
 2015/310/20151106120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/311/20151107120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/312/20151108120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/313/20151109120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/314/20151110120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/315/20151111120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443
 /opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/316/20151112120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/317/20151113120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/318/20151114120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/319/20151115120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/320/20151116120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/321/20151117120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-G
 LOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/322/20151118120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/323/20151119120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/324/20151120120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/325/20151121120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/326/20151122120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2
 /2015/327/20151123120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/328/20151124120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/329/20151125120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/330/20151126120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/331/20151127120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/332/20151128120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:44
 3/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/333/20151129120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc https://podaac-opendap.jpl.nasa.gov:443/opendap/allData/ghrsst/data/GDS2/L4/GLOB/NCEI/AVHRR_OI/v2/2015/334/20151130120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0.nc"
+In order to run Solr in cloud mode, we must first run Zookeeper.
 
-  for url in ${URL_LIST}; do
-    curl -O "${url}"
-  done
+.. code-block:: bash
 
-You should now have 30 files downloaded to your data directory, one for each day in November 2015.
+    docker run --name zookeeper -dp 2181:2181 zookeeper:${ZK_VERSION}
 
-Start Data Storage Containers
-==============================
+We then need to ensure the ``/solr`` znode is present.
 
-We will use Solr and Cassandra to store the tile metadata and data respectively.
+.. code-block:: bash
 
-.. _quickstart-step4:
+  docker exec zookeeper bash -c "bin/zkCli.sh create /solr"
+
+.. _quickstart-step5:
 
 Start Solr
 -----------
 
-SDAP is tested with Solr version 7.x with the JTS topology suite add-on installed. The SDAP docker image is based off of the official Solr image and simply adds the JTS topology suite and the nexustiles core.
+SDAP is tested with Solr version 8.11.1.
 
-.. note:: Mounting a volume is optional but if you choose to do it, you can start and stop the Solr container without having to reingest your data every time. If you do not mount a volume, every time you stop your Solr container the data will be lost.
+.. note:: Mounting a volume is optional but if you choose to do it, you can start and stop the Solr container without having to reingest your data every time. If you do not mount a volume, every time you stop your Solr container the data will be lost. If you don't want a volume, leave off the ``-v`` option in the following ``docker run`` command.
 
 To start Solr using a volume mount and expose the admin webapp on port 8983:
 
 .. code-block:: bash
 
   export SOLR_DATA=~/nexus-quickstart/solr
-  docker run --name solr --network sdap-net -v ${SOLR_DATA}:/opt/solr/server/solr/nexustiles/data -p 8983:8983 -d sdap/solr-singlenode:${VERSION}
+  mkdir -p ${SOLR_DATA}
+  docker run --name solr --network sdap-net -v ${SOLR_DATA}/:/opt/solr/server/solr/nexustiles/data -p 8983:8983 -e ZK_HOST="host.docker.internal:2181/solr" -d nexusjpl/solr:${SOLR_VERSION}
+
+This will start an instance of Solr. To initialize it, we need to run the ``solr-cloud-init`` image.
 
-If you don't want to use a volume, leave off the ``-v`` option.
+.. code-block:: bash
 
+  docker run -it --rm --name solr-init --network sdap-net -e SDAP_ZK_SOLR="host.docker.internal:2181/solr" -e SDAP_SOLR_URL="http://host.docker.internal:8983/solr/" -e CREATE_COLLECTION_PARAMS="name=nexustiles&numShards=1&waitForFinalState=true" nexusjpl/solr-cloud-init:${SOLR_CLOUD_INIT_VERSION}
 
-.. _quickstart-step5:
+When the init script finishes, kill the container by typing ``Ctrl + C``
 
-Start Cassandra
-----------------
+.. _quickstart-step6:
 
-SDAP is tested with Cassandra version 2.2.x. The SDAP docker image is based off of the official Cassandra image and simply mounts the schema DDL script into the container for easy initialization.
+Starting Cassandra
+-------------------
+
+SDAP is tested with Cassandra version 3.11.6.
 
-.. note:: Similar to the Solr container, using a volume is recommended but not required.
+.. note:: Similar to the Solr container, using a volume is recommended but not required. Be aware that the second ``-v`` option is required.
 
-To start cassandra using a volume mount and expose the connection port 9042:
+Before starting Cassandra, we need to prepare a script to initialize the database.
+
+.. code-block:: bash
+
+  export CASSANDRA_INIT=~/nexus-quickstart/init
+  mkdir -p ${CASSANDRA_INIT}
+  cat << EOF >> ${CASSANDRA_INIT}/initdb.cql
+  CREATE KEYSPACE IF NOT EXISTS nexustiles WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 1 };
+
+  CREATE TABLE IF NOT EXISTS nexustiles.sea_surface_temp  (
+  tile_id    	uuid PRIMARY KEY,
+  tile_blob  	blob
+  );
+  EOF
+
+Now we can start the image and run the initialization script.
 
 .. code-block:: bash
 
   export CASSANDRA_DATA=~/nexus-quickstart/cassandra
-  docker run --name cassandra --network sdap-net -p 9042:9042 -v ${CASSANDRA_DATA}:/var/lib/cassandra -d sdap/cassandra:${VERSION}
+  mkdir -p ${CASSANDRA_DATA}
+  docker run --name cassandra --network sdap-net -p 9042:9042 -v ${CASSANDRA_DATA}/cassandra/:/var/lib/cassandra -v "${CASSANDRA_INIT}/initdb.cql:/scripts/initdb.cql" -d bitnami/cassandra:${CASSANDRA_VERSION}
 
-.. _quickstart-step6:
+Wait a few moments for the database to start.
+
+.. code-block:: bash
+
+  docker exec  cassandra bash -c "cqlsh -u cassandra -p cassandra -f /scripts/initdb.cql"
 
-Ingest Data
-============
+With Solr and Cassandra started and initialized, we can now start the collection manager and granule ingester(s).
 
-Now that Solr and Cassandra have both been started and configured, we can ingest some data. NEXUS ingests data using the ningester docker image. This image is designed to read configuration and data from volume mounts and then tile the data and save it to the datastores. More information can be found in the :ref:`ningester` section.
+.. _quickstart-step7:
 
-Ningester needs 3 things to run:
+Start RabbitMQ
+----------------
+
+The collection manager and granule ingester(s) use RabbitMQ to communicate, so we need to start that up first.
+
+.. code-block:: bash
 
-#. Tiling configuration. How should the dataset be tiled? What is the dataset called? Are there any transformations that need to happen (e.g. kelvin to celsius conversion)? etc...
-#. Connection configuration. What should be used for metadata storage and where can it be found? What should be used for data storage and where can it be found?
-#. Data files. The data that will be ingested.
+  docker run -dp 5672:5672 -p 15672:15672 --name rmq --network sdap-net bitnami/rabbitmq:${RMQ_VERSION}
 
-Tiling configuration
+.. _quickstart-step8:
+
+Start the Granule Ingester(s)
+-----------------------------
+
+The granule ingester(s) read new granules from the message queue and process them into tiles. For the set of granules we will be using in this guide, we recommend using two ingester containers to speed up the process.
+
+.. code-block:: bash
+
+  docker run --name granule-ingester-1 --network sdap-net -e RABBITMQ_HOST="host.docker.internal:5672" -e RABBITMQ_USERNAME="user" -e RABBITMQ_PASSWORD="bitnami" -d -e CASSANDRA_CONTACT_POINTS=host.docker.internal -e CASSANDRA_USERNAME=cassandra -e CASSANDRA_PASSWORD=cassandra -e SOLR_HOST_AND_PORT="http://host.docker.internal:8983" -v ${DATA_DIRECTORY}:/data/granules/ nexusjpl/granule-ingester:${GRANULE_INGESTER_VERSION}

Review Comment:
   Perhaps some newlines here to make this more readable
   
   ```bash
   docker run --name granule-ingester-1 --network sdap-net -e RABBITMQ_HOST="host.docker.internal:5672" \
   	-e RABBITMQ_USERNAME="user" -e RABBITMQ_PASSWORD="bitnami" -d -e CASSANDRA_CONTACT_POINTS=host.docker.internal \
   	-e CASSANDRA_USERNAME=cassandra -e CASSANDRA_PASSWORD=cassandra -e SOLR_HOST_AND_PORT="http://host.docker.internal:8983" \
   	-v ${DATA_DIRECTORY}:/data/granules/ nexusjpl/granule-ingester:${GRANULE_INGESTER_VERSION}
   ```



##########
docs/quickstart.rst:
##########
@@ -64,181 +80,240 @@ The network we will be using for this quickstart will be called ``sdap-net``. Cr
 
 .. _quickstart-step3:
 
-Download Sample Data
----------------------
+Start Ingester Components and Ingest Some Science Data

Review Comment:
   Sorry for being nitpicky, but I don't think makes a very good "H1" header. Maybe some simpler like "Ingest Data" or break it apart further?



##########
docs/quickstart.rst:
##########
@@ -18,43 +18,59 @@ This quickstart guide will walk you through how to install and run NEXUS on your
 Prerequisites
 ==============
 
-* Docker (tested on v18.03.1-ce)
+* Docker (tested on v20.10.17)
 * Internet Connection
-* bash
+* bash or zsh
 * cURL
-* 500 MB of disk space
+* 10.5 GB of disk space

Review Comment:
   😱 why is so much disk space needed??



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@sdap.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org