You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sdap.apache.org by nc...@apache.org on 2022/08/09 18:48:40 UTC
[incubator-sdap-nexus] branch master updated: SDAP-390 Update NetCDF reader tool for data match-up (#178)

This is an automated email from the ASF dual-hosted git repository.

nchung pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-sdap-nexus.git


The following commit(s) were added to refs/heads/master by this push:
     new 1dc62e2  SDAP-390 Update NetCDF reader tool for data match-up (#178)
1dc62e2 is described below

commit 1dc62e2d9f61c84099982f9c4d13337899511692
Author: JordanGethers <47...@users.noreply.github.com>
AuthorDate: Tue Aug 9 14:48:36 2022 -0400

    SDAP-390 Update NetCDF reader tool for data match-up (#178)
    
    * SDAP -390 Update NetCDF reader tool for data match-up
    
    * Update CHANGELOG.md
    
    * Update cdms_reader.py
    
    * Update README.md
    
    * Update cdms_reader.py
    
    * Updated README.md.
    
    Co-authored-by: Jordan Gethers <jg...@mdc-dev-proc.coaps.fsu.edu>
    Co-authored-by: nchung <ng...@jpl.nasa.gov>
---
 CHANGELOG.md                |   7 +-
 tools/cdms/README.md        |  85 +++++++++++++++
 tools/cdms/cdms_reader.py   | 250 ++++++++++++++++++++++++++++++++++++++++++++
 tools/cdms/requirements.txt | 205 ++++++++++++++++++++++++++++++++++++
 tools/doms/README.md        |  66 ------------
 tools/doms/doms_reader.py   | 144 -------------------------
 6 files changed, 546 insertions(+), 211 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index d7cf791..aa47d27 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -12,8 +12,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - SDAP-372: Updated `match_spark_doms` to interface with samos_cdms endpoint 
 - SDAP-393: Included `insitu` in ingress based on the value of `insituAPI.enabled` in `values.yaml`
 - SDAP-371: Renamed `/domssubset` endpoint to `/cdmssubset`
+- SDAP-390: Updated NetCDF reader tool for data matchup and added user functionality.
 - SDAP-396: Added saildrone insitu api to matchup
 ### Changed
+
+-SDAP-390: Changed `/doms` to `/cdms` and `doms_reader.py` to `cdms_reader.py`
 - domslist endpoint points to AWS insitu instead of doms insitu
 ### Deprecated
 ### Removed
@@ -32,5 +35,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - Fixed issue where satellite to satellite matchups with the same dataset don't return the expected result
 - Fixed CSV and NetCDF matchup output bug
 - Fixed NetCDF output switching latitude and longitude
+
+### Security
 - Fixed import error causing `/timeSeriesSpark` queries to fail.
-### Security
\ No newline at end of file
+
diff --git a/tools/cdms/README.md b/tools/cdms/README.md
new file mode 100644
index 0000000..23111de
--- /dev/null
+++ b/tools/cdms/README.md
@@ -0,0 +1,85 @@
+# CDMS_reader.py
+The functions in cdms_reader.py read a CDMS netCDF file into memory, assemble a list of matches from a primary (satellite) and secondary (satellite or in situ) data set, and optionally outputs the matches to a CSV file. Each matched pair contains one primary data record and one in secondary data record.
+
+The CDMS netCDF files holds the two groups (`PrimaryData` and `SecondaryData`). The `matchIDs` netCDF variable contains pairs of IDs (matches) which reference a primary data record and a secondary data record in their respective groups. These records have a many-to-many relationship; one primary record may match to many in secondary records, and one secondary record may match to many primary records. The `assemble_matches` function assembles the individual data records into pairs based o [...]
+
+## Requirements
+This tool was developed and tested with Python 3.9.13.
+Imported packages:
+* argparse
+* string
+* netcdf4
+* sys
+* datetime
+* csv
+* collections
+* logging
+    
+
+## Functions
+### Function: `assemble_matches(filename)`
+Read a CDMS netCDF file into memory and return a list of matches from the file.
+
+#### Parameters 
+- `filename` (str): the CDMS netCDF file name.
+    
+#### Returns
+- `matches` (list): List of matches. 
+
+Each list element in `matches` is a dictionary organized as follows:
+    For match `m`, netCDF group `GROUP` ('PrimaryData' or 'SecondaryData'), and netCDF group variable `VARIABLE`:
+
+`matches[m][GROUP]['matchID']`: netCDF `MatchedRecords` dimension ID for the match
+`matches[m][GROUP]['GROUPID']`: GROUP netCDF `dim` dimension ID for the record
+`matches[m][GROUP][VARIABLE]`: variable value 
+
+For example, to access the timestamps of the primary data and the secondary data of the first match in the list, along with the `MatchedRecords` dimension ID and the groups' `dim` dimension ID:
+```python
+matches[0]['PrimaryData']['time']
+matches[0]['SecondaryData']['time']
+matches[0]['PrimaryData']['matchID']
+matches[0]['PrimaryData']['PrimaryDataID']
+matches[0]['SecondaryData']['SecondaryDataID']
+```
+
+        
+### Function: `matches_to_csv(matches, csvfile)`
+Write the CDMS matches to a CSV file. Include a header of column names which are based on the group and variable names from the netCDF file.
+    
+#### Parameters:
+- `matches` (list): the list of dictionaries containing the CDMS matches as returned from the `assemble_matches` function.
+- `csvfile` (str): the name of the CSV output file.
+
+### Function: `get_globals(filename)`
+Write the CDMS global attributes to a text file. Additionally,
+within the file there will be a description of where all the different
+outputs go and how to best utlize this program.
+
+#### Parameters:
+- `filename` (str): the name of the original '.nc' input file
+
+### Function: `create_logs(user_option, logName)`
+Write the CDMS log information to a file. Additionally, the user may
+opt to print this information directly to stdout, or discard it entirely.
+
+#### Parameters
+- `user_option` (str): The result of the arg.log 's interpretation of
+what option the user selected.
+- `logName` (str): The name of the log file we wish to write to,
+assuming the user did not use the -l option.
+
+## Usage
+For example, to read some CDMS netCDF file called `cdms_file.nc`:
+### Command line
+The main function for `cdms_reader.py` takes one `filename` parameter (`cdms_file.nc` argument in this example) for the CDMS netCDF file to read and calls the `assemble_matches` function. If the -c parameter is utilized, the `matches_to_csv` function is called to write the matches to a CSV file `cdms_file.csv`. If the -g parameter is utilized, the `get_globals` function is called to show them the files globals attributes as well as a short explanation of how the files can be best utlized [...]
+```
+python cdms_reader.py cdms_file.nc -c -g
+```
+python3 cdms_reader.py cdms_file.nc -c -g
+```
+python3 cdms_reader.py cdms_file.nc --csv --meta
+### Importing `assemble_matches`
+```python
+from cdms_reader import assemble_matches
+matches = assemble_matches('cdms_file.nc')
+```
diff --git a/tools/cdms/cdms_reader.py b/tools/cdms/cdms_reader.py
new file mode 100644
index 0000000..4590cc4
--- /dev/null
+++ b/tools/cdms/cdms_reader.py
@@ -0,0 +1,250 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import string
+from netCDF4 import Dataset, num2date
+import sys
+import datetime
+import csv
+from collections import OrderedDict
+import logging
+
+#TODO: Get rid of numpy errors?
+#TODO: Update big SDAP README
+
+LOGGER =  logging.getLogger("cdms_reader")
+
+def assemble_matches(filename):
+    """
+    Read a CDMS netCDF file and return a list of matches.
+    
+    Parameters
+    ----------
+    filename : str
+        The CDMS netCDF file name.
+    
+    Returns
+    -------
+    matches : list
+        List of matches. Each list element is a dictionary.
+        For match m, netCDF group GROUP (PrimaryData or SecondaryData), and
+        group variable VARIABLE:
+        matches[m][GROUP]['matchID']: MatchedRecords dimension ID for the match
+        matches[m][GROUP]['GROUPID']: GROUP dim dimension ID for the record
+        matches[m][GROUP][VARIABLE]: variable value 
+    """
+   
+    try:
+        # Open the netCDF file
+        with Dataset(filename, 'r') as cdms_nc:
+            # Check that the number of groups is consistent w/ the MatchedGroups
+            # dimension
+            assert len(cdms_nc.groups) == cdms_nc.dimensions['MatchedGroups'].size,\
+                ("Number of groups isn't the same as MatchedGroups dimension.")
+            
+            matches = []
+            matched_records = cdms_nc.dimensions['MatchedRecords'].size
+            
+            # Loop through the match IDs to assemble matches
+            for match in range(0, matched_records):
+                match_dict = OrderedDict()
+                # Grab the data from each platform (group) in the match
+                for group_num, group in enumerate(cdms_nc.groups):
+                    match_dict[group] = OrderedDict()
+                    match_dict[group]['matchID'] = match
+                    ID = cdms_nc.variables['matchIDs'][match][group_num]
+                    match_dict[group][group + 'ID'] = ID
+                    for var in cdms_nc.groups[group].variables.keys():
+                        match_dict[group][var] = cdms_nc.groups[group][var][ID]
+                    
+                    # Create a UTC datetime field from timestamp
+                    dt = num2date(match_dict[group]['time'],
+                                  cdms_nc.groups[group]['time'].units)
+                    match_dict[group]['datetime'] = dt
+                LOGGER.info(match_dict)
+                matches.append(match_dict)
+            
+            return matches
+    except (OSError, IOError) as err:
+        LOGGER.exception("Error reading netCDF file " + filename)
+        raise err
+    
+def matches_to_csv(matches, csvfile):
+    """
+    Write the CDMS matches to a CSV file. Include a header of column names
+    which are based on the group and variable names from the netCDF file.
+    
+    Parameters
+    ----------
+    matches : list
+        The list of dictionaries containing the CDMS matches as returned from
+        assemble_matches.      
+    csvfile : str
+        The name of the CSV output file.
+    """
+    # Create a header for the CSV. Column names are GROUP_VARIABLE or
+    # GROUP_GROUPID.
+    header = []
+    for key, value in matches[0].items():
+        for otherkey in value.keys():
+            header.append(key + "_" + otherkey)
+    
+    try:
+        # Write the CSV file
+        with open(csvfile, 'w') as output_file:
+            csv_writer = csv.writer(output_file)
+            csv_writer.writerow(header)
+            for match in matches:
+                row = []
+                for group, data in match.items():
+                    for value in data.values():
+                        row.append(value)
+                csv_writer.writerow(row)
+    except (OSError, IOError) as err:
+        LOGGER.exception("Error writing CSV file " + csvfile)
+        raise err
+
+def get_globals(filename):
+    """
+    Write the CDMS  global attributes to a text file. Additionally,
+     within the file there will be a description of where all the different
+     outputs go and how to best utlize this program.
+    
+    Parameters
+    ----------      
+    filename : str
+        The name of the original '.nc' input file.
+    
+    """
+    x0 = "README / cdms_reader.py Program Use and Description:\n"
+    x1 = "\nThe cdms_reader.py program reads a CDMS netCDF (a NETCDF file with a matchIDs variable)\n"
+    x2 = "file into memory, assembles a list of matches of primary and secondary data\n"
+    x3 = "and optionally\n"
+    x4 = "output the matches to a CSV file. Each matched pair contains one primary\n"
+    x5 = "data record and one secondary data record.\n"
+    x6 = "\nBelow, this file wil list the global attributes of the .nc (NETCDF) file.\n"
+    x7 = "If you wish to see a full dump of the data from the .nc file,\n"
+    x8 = "please utilize the ncdump command from NETCDF (or look at the CSV file).\n"
+    try:
+        with Dataset(filename, "r", format="NETCDF4") as ncFile:
+            txtName = filename.replace(".nc", ".txt")
+            with open(txtName, "w") as txt:
+                txt.write(x0 + x1 +x2 +x3 + x4 + x5 + x6 + x7 + x8)
+                txt.write("\nGlobal Attributes:")
+                for x in ncFile.ncattrs():
+                    txt.write(f'\t :{x} = "{ncFile.getncattr(x)}" ;\n')
+
+
+    except (OSError, IOError) as err:
+        LOGGER.exception("Error reading netCDF file " + filename)
+        print("Error reading file!")
+        raise err
+
+def create_logs(user_option, logName):
+    """
+    Write the CDMS log information to a file. Additionally, the user may
+    opt to print this information directly to stdout, or discard it entirely.
+    
+    Parameters
+    ----------      
+    user_option : str
+        The result of the arg.log 's interpretation of
+         what option the user selected.
+    logName : str
+        The name of the log file we wish to write to,
+        assuming the user did not use the -l option.
+    """
+    if user_option == 'N':
+        print("** Note: No log was created **")
+
+
+    elif user_option == '1':
+        #prints the log contents to stdout
+        logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s',
+                        level=logging.INFO,
+                        datefmt='%Y-%m-%d %H:%M:%S',
+                        handlers=[
+                            logging.StreamHandler(sys.stdout)
+                            ])
+                
+    else:
+        #prints log to a .log file
+        logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s',
+                        level=logging.INFO,
+                        datefmt='%Y-%m-%d %H:%M:%S',
+                        handlers=[
+                            logging.FileHandler(logName)
+                            ])
+        if user_option != 1 and user_option != 'Y':
+            print(f"** Bad usage of log option. Log will print to {logName} **")
+
+    
+
+
+
+if __name__ == '__main__':
+    """
+    Execution:
+        python cdms_reader.py filename
+        OR
+        python3 cdms_reader.py filename 
+        OR
+        python3 cdms_reader.py filename -c -g 
+        OR
+        python3 cdms_reader.py filename --csv --meta
+
+    Note (For Help Try):
+            python3 cdms_reader.py -h
+            OR
+            python3 cdms_reader.py --help
+
+    """
+   
+    u0 = '\n%(prog)s -h OR --help \n'
+    u1 = '%(prog)s filename -c -g\n%(prog)s filename --csv --meta\n'
+    u2 ='Use -l OR -l1 to modify destination of logs'
+    p = argparse.ArgumentParser(usage= u0 + u1 + u2)
+
+    #below block is to customize user options
+    p.add_argument('filename', help='CDMS netCDF file to read')
+    p.add_argument('-c', '--csv', nargs='?', const= 'Y', default='N',
+     help='Use -c or --csv to retrieve CSV output')
+    p.add_argument('-g', '--meta', nargs='?', const='Y', default='N',
+     help='Use -g or --meta to retrieve global attributes / metadata')
+    p.add_argument('-l', '--log', nargs='?', const='N', default='Y',
+     help='Use -l or --log to AVOID creating log files, OR use -l1 to print to stdout/console') 
+
+    #arguments are processed by the next line
+    args = p.parse_args()
+
+    logName = args.filename.replace(".nc", ".log")
+    create_logs(args.log, logName)
+    
+    cdms_matches = assemble_matches(args.filename)
+
+    if args.csv == 'Y' :
+        matches_to_csv(cdms_matches, args.filename.replace(".nc",".csv"))
+
+    if args.meta == 'Y' :
+        get_globals(args.filename)
+
+
+
+
+    
+
+    
+    
diff --git a/tools/cdms/requirements.txt b/tools/cdms/requirements.txt
new file mode 100644
index 0000000..88945ac
--- /dev/null
+++ b/tools/cdms/requirements.txt
@@ -0,0 +1,205 @@
+anyio==3.6.1
+appdirs==1.4.4
+argcomplete==1.12.0
+argon2-cffi==21.3.0
+argon2-cffi-bindings==21.2.0
+asn1crypto==1.4.0
+astroid==2.6.6
+asttokens==2.0.5
+atomicwrites==1.4.0
+attrs==20.3.0
+Babel==2.10.1
+backcall==0.1.0
+basemap==1.2.1
+beautifulsoup4==4.9.3
+bleach==5.0.0
+Bottleneck==1.2.1
+Cartopy==0.19.0
+ceph==1.0.0
+cephfs==2.0.0
+certifi==2020.12.5
+cffi==1.14.5
+cftime==1.4.1
+chardet==4.0.0
+click==8.1.3
+click-plugins==1.1.1
+cligj==0.7.2
+cloudpickle==1.6.0
+conda==4.10.1
+conda-package-handling==1.7.2
+configobj==5.0.6
+configparser==5.0.2
+cryptography==3.4.6
+cycler==0.10.0
+Cython==0.29.21
+cytoolz==0.11.0
+dasbus==1.4
+dbus-python==1.2.18
+debugpy==1.6.0
+decorator==4.4.2
+defusedxml==0.7.1
+distro==1.5.0
+entrypoints==0.4
+executing==0.8.3
+fail2ban==0.11.2
+fastjsonschema==2.15.3
+Fiona==1.8.21
+frozendict==1.2
+future==0.18.2
+GDAL==3.2.2
+geopandas==0.10.2
+Glances==3.1.4.1
+gpg==1.15.1
+idna==2.10
+importlib-metadata==4.11.4
+iniconfig==1.1.1
+iniparse==0.4
+iotop==0.6
+ipykernel==6.13.0
+ipython==8.4.0
+ipython-genutils==0.1.0
+isc==2.0
+isort==5.7.0
+jedi==0.17.2
+Jinja2==3.1.2
+joblib==1.0.1
+json5==0.9.8
+jsonschema==4.5.1
+jupyter-client==7.3.1
+jupyter-core==4.10.0
+jupyter-server==1.17.0
+jupyterlab==3.4.2
+jupyterlab-pygments==0.2.2
+jupyterlab-server==2.14.0
+kiwisolver==1.3.2
+lazy-object-proxy==1.6.0
+libarchive-c==2.9
+libcomps==0.1.18
+lxml==4.6.5
+lz4==3.0.2
+MarkupSafe==2.1.1
+matplotlib==3.4.3
+matplotlib-inline==0.1.3
+mccabe==0.6.1
+MetPy==1.1.0
+mistune==0.8.4
+mock==3.0.5
+more-itertools==8.12.0
+munch==2.5.0
+mysql-connector-python==8.0.21
+mysqlclient==1.4.6
+nbclassic==0.3.7
+nbclient==0.6.4
+nbconvert==6.5.0
+nbformat==5.4.0
+nest-asyncio==1.5.5
+netCDF4==1.5.5.1
+nftables==0.1
+notebook==6.4.11
+notebook-shim==0.1.0
+ntp==1.2.1
+numexpr==2.7.1
+numpy==1.20.1
+olefile==0.46
+OWSLib==0.21.0
+packaging==20.9
+pandas==1.2.5
+pandas-datareader==0.9.0
+pandocfilters==1.5.0
+parso==0.8.0
+pathlib2==2.3.6
+pbr==5.5.1
+perf==0.1
+pexpect==4.8.0
+pickle5==0.0.11
+pickleshare==0.7.5
+Pillow==8.1.2
+Pint==0.17
+pluggy==0.13.1
+ply==3.11
+podaacpy==2.4.0
+pooch==1.5.1
+prettytable==0.7.2
+prometheus-client==0.14.1
+prompt-toolkit==3.0.5
+protobuf==3.14.0
+psutil==5.8.0
+ptyprocess==0.6.0
+pure-eval==0.2.2
+py==1.11.0
+py-cpuinfo==7.0.0
+py3nvml==0.2.6
+pycairo==1.20.1
+pycosat==0.6.3
+pycparser==2.20
+Pygments==2.7.4
+PyGObject==3.40.1
+pykdtree==1.3.4
+pylint==2.9.6
+PyMySQL==0.10.1
+pyOpenSSL==21.0.0
+pyparsing==2.4.7
+pyproj==3.2.0
+PyQt4-sip==4.19.24
+PyQt5==5.15.0
+PyQt5-sip==4.19.24
+pyrsistent==0.18.1
+pyshp==2.1.3
+PySocks==1.7.1
+pytest==6.2.2
+pytest-runner==4.0
+python-dateutil==2.8.2
+python-dmidecode==3.12.2
+python-linux-procfs==0.6.3
+pytz==2021.3
+pyudev==0.22.0
+PyYAML==5.4.1
+pyzmq==23.0.0
+rados==2.0.0
+rbd==2.0.0
+requests==2.25.1
+requests-file==1.5.1
+rgw==2.0.0
+rpm==4.16.1.3
+ruamel.yaml==0.16.6
+ruamel.yaml.clib==0.1.2
+ruptures==1.1.4
+scikit-learn==0.24.1
+scipy==1.6.2
+selinux==3.2
+Send2Trash==1.8.0
+sepolicy==3.2
+setools==4.4.0
+setroubleshoot==1.1
+Shapely==1.7.1
+simplegeneric==0.8.1
+six==1.15.0
+slip==0.6.4
+slip.dbus==0.6.4
+smartcols==0.3.0
+sniffio==1.2.0
+sos==4.1
+soupsieve==2.2.1
+SSSDConfig==2.5.2
+stack-data==0.2.0
+systemd-python==234
+tables==3.6.1
+terminado==0.15.0
+threadpoolctl==2.0.0
+tinycss2==1.1.1
+toml==0.10.2
+toolz==0.11.1
+tornado==6.1
+tqdm==4.62.1
+traitlets==5.2.2.post1
+typed-ast==1.4.3
+urllib3==1.25.10
+wcwidth==0.2.5
+webencodings==0.5.1
+websocket-client==1.3.2
+wrapt==1.12.1
+xarray==0.19.0
+xlrd==2.0.1
+xlwt==1.3.0
+xmltodict==0.12.0
+zipp==3.8.0
diff --git a/tools/doms/README.md b/tools/doms/README.md
deleted file mode 100644
index c49fa4a..0000000
--- a/tools/doms/README.md
+++ /dev/null
@@ -1,66 +0,0 @@
-# doms_reader.py
-The functions in doms_reader.py read a DOMS netCDF file into memory, assemble a list of matches of satellite and in situ data, and optionally output the matches to a CSV file. Each matched pair contains one satellite data record and one in situ data record.
-
-The DOMS netCDF files hold satellite data and in situ data in different groups (`SatelliteData` and `InsituData`). The `matchIDs` netCDF variable contains pairs of IDs (matches) which reference a satellite data record and an in situ data record in their respective groups. These records have a many-to-many relationship; one satellite record may match to many in situ records, and one in situ record may match to many satellite records. The `assemble_matches` function assembles the individua [...]
-
-## Requirements
-This tool was developed and tested with Python 2.7.5 and 3.7.0a0.
-Imported packages:
-* argparse
-* netcdf4
-* sys
-* datetime
-* csv
-* collections
-* logging
-    
-
-## Functions
-### Function: `assemble_matches(filename)`
-Read a DOMS netCDF file into memory and return a list of matches from the file.
-
-#### Parameters 
-- `filename` (str): the DOMS netCDF file name.
-    
-#### Returns
-- `matches` (list): List of matches. 
-
-Each list element in `matches` is a dictionary organized as follows:
-    For match `m`, netCDF group `GROUP` ('SatelliteData' or 'InsituData'), and netCDF group variable `VARIABLE`:
-
-`matches[m][GROUP]['matchID']`: netCDF `MatchedRecords` dimension ID for the match
-`matches[m][GROUP]['GROUPID']`: GROUP netCDF `dim` dimension ID for the record
-`matches[m][GROUP][VARIABLE]`: variable value 
-
-For example, to access the timestamps of the satellite data and the in situ data of the first match in the list, along with the `MatchedRecords` dimension ID and the groups' `dim` dimension ID:
-```python
-matches[0]['SatelliteData']['time']
-matches[0]['InsituData']['time']
-matches[0]['SatelliteData']['matchID']
-matches[0]['SatelliteData']['SatelliteDataID']
-matches[0]['InsituData']['InsituDataID']
-```
-
-        
-### Function: `matches_to_csv(matches, csvfile)`
-Write the DOMS matches to a CSV file. Include a header of column names which are based on the group and variable names from the netCDF file.
-    
-#### Parameters:
-- `matches` (list): the list of dictionaries containing the DOMS matches as returned from the `assemble_matches` function.
-- `csvfile` (str): the name of the CSV output file.
-
-## Usage
-For example, to read some DOMS netCDF file called `doms_file.nc`:
-### Command line
-The main function for `doms_reader.py` takes one `filename` parameter (`doms_file.nc` argument in this example) for the DOMS netCDF file to read, calls the `assemble_matches` function, then calls the `matches_to_csv` function to write the matches to a CSV file `doms_matches.csv`.
-```
-python doms_reader.py doms_file.nc
-```
-```
-python3 doms_reader.py doms_file.nc
-```
-### Importing `assemble_matches`
-```python
-from doms_reader import assemble_matches
-matches = assemble_matches('doms_file.nc')
-```
diff --git a/tools/doms/doms_reader.py b/tools/doms/doms_reader.py
deleted file mode 100644
index 7c614ce..0000000
--- a/tools/doms/doms_reader.py
+++ /dev/null
@@ -1,144 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements.  See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import argparse
-from netCDF4 import Dataset, num2date
-import sys
-import datetime
-import csv
-from collections import OrderedDict
-import logging
-
-LOGGER = logging.getLogger("doms_reader")
-
-def assemble_matches(filename):
-    """
-    Read a DOMS netCDF file and return a list of matches.
-    
-    Parameters
-    ----------
-    filename : str
-        The DOMS netCDF file name.
-    
-    Returns
-    -------
-    matches : list
-        List of matches. Each list element is a dictionary.
-        For match m, netCDF group GROUP (SatelliteData or InsituData), and
-        group variable VARIABLE:
-        matches[m][GROUP]['matchID']: MatchedRecords dimension ID for the match
-        matches[m][GROUP]['GROUPID']: GROUP dim dimension ID for the record
-        matches[m][GROUP][VARIABLE]: variable value 
-    """
-    
-    try:
-        # Open the netCDF file
-        with Dataset(filename, 'r') as doms_nc:
-            # Check that the number of groups is consistent w/ the MatchedGroups
-            # dimension
-            assert len(doms_nc.groups) == doms_nc.dimensions['MatchedGroups'].size,\
-                ("Number of groups isn't the same as MatchedGroups dimension.")
-            
-            matches = []
-            matched_records = doms_nc.dimensions['MatchedRecords'].size
-            
-            # Loop through the match IDs to assemble matches
-            for match in range(0, matched_records):
-                match_dict = OrderedDict()
-                # Grab the data from each platform (group) in the match
-                for group_num, group in enumerate(doms_nc.groups):
-                    match_dict[group] = OrderedDict()
-                    match_dict[group]['matchID'] = match
-                    ID = doms_nc.variables['matchIDs'][match][group_num]
-                    match_dict[group][group + 'ID'] = ID
-                    for var in list(doms_nc.groups[group].variables.keys()):
-                        match_dict[group][var] = doms_nc.groups[group][var][ID]
-                    
-                    # Create a UTC datetime field from timestamp
-                    dt = num2date(match_dict[group]['time'],
-                                  doms_nc.groups[group]['time'].units)
-                    match_dict[group]['datetime'] = dt
-                LOGGER.info(match_dict)
-                matches.append(match_dict)
-            
-            return matches
-    except (OSError, IOError) as err:
-        LOGGER.exception("Error reading netCDF file " + filename)
-        raise err
-    
-def matches_to_csv(matches, csvfile):
-    """
-    Write the DOMS matches to a CSV file. Include a header of column names
-    which are based on the group and variable names from the netCDF file.
-    
-    Parameters
-    ----------
-    matches : list
-        The list of dictionaries containing the DOMS matches as returned from
-        assemble_matches.      
-    csvfile : str
-        The name of the CSV output file.
-    """
-    # Create a header for the CSV. Column names are GROUP_VARIABLE or
-    # GROUP_GROUPID.
-    header = []
-    for key, value in list(matches[0].items()):
-        for otherkey in list(value.keys()):
-            header.append(key + "_" + otherkey)
-    
-    try:
-        # Write the CSV file
-        with open(csvfile, 'w') as output_file:
-            csv_writer = csv.writer(output_file)
-            csv_writer.writerow(header)
-            for match in matches:
-                row = []
-                for group, data in list(match.items()):
-                    for value in list(data.values()):
-                        row.append(value)
-                csv_writer.writerow(row)
-    except (OSError, IOError) as err:
-        LOGGER.exception("Error writing CSV file " + csvfile)
-        raise err
-
-if __name__ == '__main__':
-    """
-    Execution:
-        python doms_reader.py filename
-        OR
-        python3 doms_reader.py filename
-    """
-    logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s',
-                    level=logging.INFO,
-                    datefmt='%Y-%m-%d %H:%M:%S')
-
-    p = argparse.ArgumentParser()
-    p.add_argument('filename', help='DOMS netCDF file to read')
-    args = p.parse_args()
-
-    doms_matches = assemble_matches(args.filename)
-
-    matches_to_csv(doms_matches, 'doms_matches.csv')
-    
-    
-    
-    
-    
-    
-    
-    
-
-    
-    
\ No newline at end of file