You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sdap.apache.org by nc...@apache.org on 2022/08/09 18:48:40 UTC
[incubator-sdap-nexus] branch master updated: SDAP-390 Update NetCDF reader tool for data match-up (#178)
This is an automated email from the ASF dual-hosted git repository.
nchung pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-sdap-nexus.git
The following commit(s) were added to refs/heads/master by this push:
new 1dc62e2 SDAP-390 Update NetCDF reader tool for data match-up (#178)
1dc62e2 is described below
commit 1dc62e2d9f61c84099982f9c4d13337899511692
Author: JordanGethers <47...@users.noreply.github.com>
AuthorDate: Tue Aug 9 14:48:36 2022 -0400
SDAP-390 Update NetCDF reader tool for data match-up (#178)
* SDAP -390 Update NetCDF reader tool for data match-up
* Update CHANGELOG.md
* Update cdms_reader.py
* Update README.md
* Update cdms_reader.py
* Updated README.md.
Co-authored-by: Jordan Gethers <jg...@mdc-dev-proc.coaps.fsu.edu>
Co-authored-by: nchung <ng...@jpl.nasa.gov>
---
CHANGELOG.md | 7 +-
tools/cdms/README.md | 85 +++++++++++++++
tools/cdms/cdms_reader.py | 250 ++++++++++++++++++++++++++++++++++++++++++++
tools/cdms/requirements.txt | 205 ++++++++++++++++++++++++++++++++++++
tools/doms/README.md | 66 ------------
tools/doms/doms_reader.py | 144 -------------------------
6 files changed, 546 insertions(+), 211 deletions(-)
diff --git a/CHANGELOG.md b/CHANGELOG.md
index d7cf791..aa47d27 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -12,8 +12,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- SDAP-372: Updated `match_spark_doms` to interface with samos_cdms endpoint
- SDAP-393: Included `insitu` in ingress based on the value of `insituAPI.enabled` in `values.yaml`
- SDAP-371: Renamed `/domssubset` endpoint to `/cdmssubset`
+- SDAP-390: Updated NetCDF reader tool for data matchup and added user functionality.
- SDAP-396: Added saildrone insitu api to matchup
### Changed
+
+-SDAP-390: Changed `/doms` to `/cdms` and `doms_reader.py` to `cdms_reader.py`
- domslist endpoint points to AWS insitu instead of doms insitu
### Deprecated
### Removed
@@ -32,5 +35,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Fixed issue where satellite to satellite matchups with the same dataset don't return the expected result
- Fixed CSV and NetCDF matchup output bug
- Fixed NetCDF output switching latitude and longitude
+
+### Security
- Fixed import error causing `/timeSeriesSpark` queries to fail.
-### Security
\ No newline at end of file
+
diff --git a/tools/cdms/README.md b/tools/cdms/README.md
new file mode 100644
index 0000000..23111de
--- /dev/null
+++ b/tools/cdms/README.md
@@ -0,0 +1,85 @@
+# CDMS_reader.py
+The functions in cdms_reader.py read a CDMS netCDF file into memory, assemble a list of matches from a primary (satellite) and secondary (satellite or in situ) data set, and optionally outputs the matches to a CSV file. Each matched pair contains one primary data record and one in secondary data record.
+
+The CDMS netCDF files holds the two groups (`PrimaryData` and `SecondaryData`). The `matchIDs` netCDF variable contains pairs of IDs (matches) which reference a primary data record and a secondary data record in their respective groups. These records have a many-to-many relationship; one primary record may match to many in secondary records, and one secondary record may match to many primary records. The `assemble_matches` function assembles the individual data records into pairs based o [...]
+
+## Requirements
+This tool was developed and tested with Python 3.9.13.
+Imported packages:
+* argparse
+* string
+* netcdf4
+* sys
+* datetime
+* csv
+* collections
+* logging
+
+
+## Functions
+### Function: `assemble_matches(filename)`
+Read a CDMS netCDF file into memory and return a list of matches from the file.
+
+#### Parameters
+- `filename` (str): the CDMS netCDF file name.
+
+#### Returns
+- `matches` (list): List of matches.
+
+Each list element in `matches` is a dictionary organized as follows:
+ For match `m`, netCDF group `GROUP` ('PrimaryData' or 'SecondaryData'), and netCDF group variable `VARIABLE`:
+
+`matches[m][GROUP]['matchID']`: netCDF `MatchedRecords` dimension ID for the match
+`matches[m][GROUP]['GROUPID']`: GROUP netCDF `dim` dimension ID for the record
+`matches[m][GROUP][VARIABLE]`: variable value
+
+For example, to access the timestamps of the primary data and the secondary data of the first match in the list, along with the `MatchedRecords` dimension ID and the groups' `dim` dimension ID:
+```python
+matches[0]['PrimaryData']['time']
+matches[0]['SecondaryData']['time']
+matches[0]['PrimaryData']['matchID']
+matches[0]['PrimaryData']['PrimaryDataID']
+matches[0]['SecondaryData']['SecondaryDataID']
+```
+
+
+### Function: `matches_to_csv(matches, csvfile)`
+Write the CDMS matches to a CSV file. Include a header of column names which are based on the group and variable names from the netCDF file.
+
+#### Parameters:
+- `matches` (list): the list of dictionaries containing the CDMS matches as returned from the `assemble_matches` function.
+- `csvfile` (str): the name of the CSV output file.
+
+### Function: `get_globals(filename)`
+Write the CDMS global attributes to a text file. Additionally,
+within the file there will be a description of where all the different
+outputs go and how to best utlize this program.
+
+#### Parameters:
+- `filename` (str): the name of the original '.nc' input file
+
+### Function: `create_logs(user_option, logName)`
+Write the CDMS log information to a file. Additionally, the user may
+opt to print this information directly to stdout, or discard it entirely.
+
+#### Parameters
+- `user_option` (str): The result of the arg.log 's interpretation of
+what option the user selected.
+- `logName` (str): The name of the log file we wish to write to,
+assuming the user did not use the -l option.
+
+## Usage
+For example, to read some CDMS netCDF file called `cdms_file.nc`:
+### Command line
+The main function for `cdms_reader.py` takes one `filename` parameter (`cdms_file.nc` argument in this example) for the CDMS netCDF file to read and calls the `assemble_matches` function. If the -c parameter is utilized, the `matches_to_csv` function is called to write the matches to a CSV file `cdms_file.csv`. If the -g parameter is utilized, the `get_globals` function is called to show them the files globals attributes as well as a short explanation of how the files can be best utlized [...]
+```
+python cdms_reader.py cdms_file.nc -c -g
+```
+python3 cdms_reader.py cdms_file.nc -c -g
+```
+python3 cdms_reader.py cdms_file.nc --csv --meta
+### Importing `assemble_matches`
+```python
+from cdms_reader import assemble_matches
+matches = assemble_matches('cdms_file.nc')
+```
diff --git a/tools/cdms/cdms_reader.py b/tools/cdms/cdms_reader.py
new file mode 100644
index 0000000..4590cc4
--- /dev/null
+++ b/tools/cdms/cdms_reader.py
@@ -0,0 +1,250 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements. See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import string
+from netCDF4 import Dataset, num2date
+import sys
+import datetime
+import csv
+from collections import OrderedDict
+import logging
+
+#TODO: Get rid of numpy errors?
+#TODO: Update big SDAP README
+
+LOGGER = logging.getLogger("cdms_reader")
+
+def assemble_matches(filename):
+ """
+ Read a CDMS netCDF file and return a list of matches.
+
+ Parameters
+ ----------
+ filename : str
+ The CDMS netCDF file name.
+
+ Returns
+ -------
+ matches : list
+ List of matches. Each list element is a dictionary.
+ For match m, netCDF group GROUP (PrimaryData or SecondaryData), and
+ group variable VARIABLE:
+ matches[m][GROUP]['matchID']: MatchedRecords dimension ID for the match
+ matches[m][GROUP]['GROUPID']: GROUP dim dimension ID for the record
+ matches[m][GROUP][VARIABLE]: variable value
+ """
+
+ try:
+ # Open the netCDF file
+ with Dataset(filename, 'r') as cdms_nc:
+ # Check that the number of groups is consistent w/ the MatchedGroups
+ # dimension
+ assert len(cdms_nc.groups) == cdms_nc.dimensions['MatchedGroups'].size,\
+ ("Number of groups isn't the same as MatchedGroups dimension.")
+
+ matches = []
+ matched_records = cdms_nc.dimensions['MatchedRecords'].size
+
+ # Loop through the match IDs to assemble matches
+ for match in range(0, matched_records):
+ match_dict = OrderedDict()
+ # Grab the data from each platform (group) in the match
+ for group_num, group in enumerate(cdms_nc.groups):
+ match_dict[group] = OrderedDict()
+ match_dict[group]['matchID'] = match
+ ID = cdms_nc.variables['matchIDs'][match][group_num]
+ match_dict[group][group + 'ID'] = ID
+ for var in cdms_nc.groups[group].variables.keys():
+ match_dict[group][var] = cdms_nc.groups[group][var][ID]
+
+ # Create a UTC datetime field from timestamp
+ dt = num2date(match_dict[group]['time'],
+ cdms_nc.groups[group]['time'].units)
+ match_dict[group]['datetime'] = dt
+ LOGGER.info(match_dict)
+ matches.append(match_dict)
+
+ return matches
+ except (OSError, IOError) as err:
+ LOGGER.exception("Error reading netCDF file " + filename)
+ raise err
+
+def matches_to_csv(matches, csvfile):
+ """
+ Write the CDMS matches to a CSV file. Include a header of column names
+ which are based on the group and variable names from the netCDF file.
+
+ Parameters
+ ----------
+ matches : list
+ The list of dictionaries containing the CDMS matches as returned from
+ assemble_matches.
+ csvfile : str
+ The name of the CSV output file.
+ """
+ # Create a header for the CSV. Column names are GROUP_VARIABLE or
+ # GROUP_GROUPID.
+ header = []
+ for key, value in matches[0].items():
+ for otherkey in value.keys():
+ header.append(key + "_" + otherkey)
+
+ try:
+ # Write the CSV file
+ with open(csvfile, 'w') as output_file:
+ csv_writer = csv.writer(output_file)
+ csv_writer.writerow(header)
+ for match in matches:
+ row = []
+ for group, data in match.items():
+ for value in data.values():
+ row.append(value)
+ csv_writer.writerow(row)
+ except (OSError, IOError) as err:
+ LOGGER.exception("Error writing CSV file " + csvfile)
+ raise err
+
+def get_globals(filename):
+ """
+ Write the CDMS global attributes to a text file. Additionally,
+ within the file there will be a description of where all the different
+ outputs go and how to best utlize this program.
+
+ Parameters
+ ----------
+ filename : str
+ The name of the original '.nc' input file.
+
+ """
+ x0 = "README / cdms_reader.py Program Use and Description:\n"
+ x1 = "\nThe cdms_reader.py program reads a CDMS netCDF (a NETCDF file with a matchIDs variable)\n"
+ x2 = "file into memory, assembles a list of matches of primary and secondary data\n"
+ x3 = "and optionally\n"
+ x4 = "output the matches to a CSV file. Each matched pair contains one primary\n"
+ x5 = "data record and one secondary data record.\n"
+ x6 = "\nBelow, this file wil list the global attributes of the .nc (NETCDF) file.\n"
+ x7 = "If you wish to see a full dump of the data from the .nc file,\n"
+ x8 = "please utilize the ncdump command from NETCDF (or look at the CSV file).\n"
+ try:
+ with Dataset(filename, "r", format="NETCDF4") as ncFile:
+ txtName = filename.replace(".nc", ".txt")
+ with open(txtName, "w") as txt:
+ txt.write(x0 + x1 +x2 +x3 + x4 + x5 + x6 + x7 + x8)
+ txt.write("\nGlobal Attributes:")
+ for x in ncFile.ncattrs():
+ txt.write(f'\t :{x} = "{ncFile.getncattr(x)}" ;\n')
+
+
+ except (OSError, IOError) as err:
+ LOGGER.exception("Error reading netCDF file " + filename)
+ print("Error reading file!")
+ raise err
+
+def create_logs(user_option, logName):
+ """
+ Write the CDMS log information to a file. Additionally, the user may
+ opt to print this information directly to stdout, or discard it entirely.
+
+ Parameters
+ ----------
+ user_option : str
+ The result of the arg.log 's interpretation of
+ what option the user selected.
+ logName : str
+ The name of the log file we wish to write to,
+ assuming the user did not use the -l option.
+ """
+ if user_option == 'N':
+ print("** Note: No log was created **")
+
+
+ elif user_option == '1':
+ #prints the log contents to stdout
+ logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s',
+ level=logging.INFO,
+ datefmt='%Y-%m-%d %H:%M:%S',
+ handlers=[
+ logging.StreamHandler(sys.stdout)
+ ])
+
+ else:
+ #prints log to a .log file
+ logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s',
+ level=logging.INFO,
+ datefmt='%Y-%m-%d %H:%M:%S',
+ handlers=[
+ logging.FileHandler(logName)
+ ])
+ if user_option != 1 and user_option != 'Y':
+ print(f"** Bad usage of log option. Log will print to {logName} **")
+
+
+
+
+
+if __name__ == '__main__':
+ """
+ Execution:
+ python cdms_reader.py filename
+ OR
+ python3 cdms_reader.py filename
+ OR
+ python3 cdms_reader.py filename -c -g
+ OR
+ python3 cdms_reader.py filename --csv --meta
+
+ Note (For Help Try):
+ python3 cdms_reader.py -h
+ OR
+ python3 cdms_reader.py --help
+
+ """
+
+ u0 = '\n%(prog)s -h OR --help \n'
+ u1 = '%(prog)s filename -c -g\n%(prog)s filename --csv --meta\n'
+ u2 ='Use -l OR -l1 to modify destination of logs'
+ p = argparse.ArgumentParser(usage= u0 + u1 + u2)
+
+ #below block is to customize user options
+ p.add_argument('filename', help='CDMS netCDF file to read')
+ p.add_argument('-c', '--csv', nargs='?', const= 'Y', default='N',
+ help='Use -c or --csv to retrieve CSV output')
+ p.add_argument('-g', '--meta', nargs='?', const='Y', default='N',
+ help='Use -g or --meta to retrieve global attributes / metadata')
+ p.add_argument('-l', '--log', nargs='?', const='N', default='Y',
+ help='Use -l or --log to AVOID creating log files, OR use -l1 to print to stdout/console')
+
+ #arguments are processed by the next line
+ args = p.parse_args()
+
+ logName = args.filename.replace(".nc", ".log")
+ create_logs(args.log, logName)
+
+ cdms_matches = assemble_matches(args.filename)
+
+ if args.csv == 'Y' :
+ matches_to_csv(cdms_matches, args.filename.replace(".nc",".csv"))
+
+ if args.meta == 'Y' :
+ get_globals(args.filename)
+
+
+
+
+
+
+
+
diff --git a/tools/cdms/requirements.txt b/tools/cdms/requirements.txt
new file mode 100644
index 0000000..88945ac
--- /dev/null
+++ b/tools/cdms/requirements.txt
@@ -0,0 +1,205 @@
+anyio==3.6.1
+appdirs==1.4.4
+argcomplete==1.12.0
+argon2-cffi==21.3.0
+argon2-cffi-bindings==21.2.0
+asn1crypto==1.4.0
+astroid==2.6.6
+asttokens==2.0.5
+atomicwrites==1.4.0
+attrs==20.3.0
+Babel==2.10.1
+backcall==0.1.0
+basemap==1.2.1
+beautifulsoup4==4.9.3
+bleach==5.0.0
+Bottleneck==1.2.1
+Cartopy==0.19.0
+ceph==1.0.0
+cephfs==2.0.0
+certifi==2020.12.5
+cffi==1.14.5
+cftime==1.4.1
+chardet==4.0.0
+click==8.1.3
+click-plugins==1.1.1
+cligj==0.7.2
+cloudpickle==1.6.0
+conda==4.10.1
+conda-package-handling==1.7.2
+configobj==5.0.6
+configparser==5.0.2
+cryptography==3.4.6
+cycler==0.10.0
+Cython==0.29.21
+cytoolz==0.11.0
+dasbus==1.4
+dbus-python==1.2.18
+debugpy==1.6.0
+decorator==4.4.2
+defusedxml==0.7.1
+distro==1.5.0
+entrypoints==0.4
+executing==0.8.3
+fail2ban==0.11.2
+fastjsonschema==2.15.3
+Fiona==1.8.21
+frozendict==1.2
+future==0.18.2
+GDAL==3.2.2
+geopandas==0.10.2
+Glances==3.1.4.1
+gpg==1.15.1
+idna==2.10
+importlib-metadata==4.11.4
+iniconfig==1.1.1
+iniparse==0.4
+iotop==0.6
+ipykernel==6.13.0
+ipython==8.4.0
+ipython-genutils==0.1.0
+isc==2.0
+isort==5.7.0
+jedi==0.17.2
+Jinja2==3.1.2
+joblib==1.0.1
+json5==0.9.8
+jsonschema==4.5.1
+jupyter-client==7.3.1
+jupyter-core==4.10.0
+jupyter-server==1.17.0
+jupyterlab==3.4.2
+jupyterlab-pygments==0.2.2
+jupyterlab-server==2.14.0
+kiwisolver==1.3.2
+lazy-object-proxy==1.6.0
+libarchive-c==2.9
+libcomps==0.1.18
+lxml==4.6.5
+lz4==3.0.2
+MarkupSafe==2.1.1
+matplotlib==3.4.3
+matplotlib-inline==0.1.3
+mccabe==0.6.1
+MetPy==1.1.0
+mistune==0.8.4
+mock==3.0.5
+more-itertools==8.12.0
+munch==2.5.0
+mysql-connector-python==8.0.21
+mysqlclient==1.4.6
+nbclassic==0.3.7
+nbclient==0.6.4
+nbconvert==6.5.0
+nbformat==5.4.0
+nest-asyncio==1.5.5
+netCDF4==1.5.5.1
+nftables==0.1
+notebook==6.4.11
+notebook-shim==0.1.0
+ntp==1.2.1
+numexpr==2.7.1
+numpy==1.20.1
+olefile==0.46
+OWSLib==0.21.0
+packaging==20.9
+pandas==1.2.5
+pandas-datareader==0.9.0
+pandocfilters==1.5.0
+parso==0.8.0
+pathlib2==2.3.6
+pbr==5.5.1
+perf==0.1
+pexpect==4.8.0
+pickle5==0.0.11
+pickleshare==0.7.5
+Pillow==8.1.2
+Pint==0.17
+pluggy==0.13.1
+ply==3.11
+podaacpy==2.4.0
+pooch==1.5.1
+prettytable==0.7.2
+prometheus-client==0.14.1
+prompt-toolkit==3.0.5
+protobuf==3.14.0
+psutil==5.8.0
+ptyprocess==0.6.0
+pure-eval==0.2.2
+py==1.11.0
+py-cpuinfo==7.0.0
+py3nvml==0.2.6
+pycairo==1.20.1
+pycosat==0.6.3
+pycparser==2.20
+Pygments==2.7.4
+PyGObject==3.40.1
+pykdtree==1.3.4
+pylint==2.9.6
+PyMySQL==0.10.1
+pyOpenSSL==21.0.0
+pyparsing==2.4.7
+pyproj==3.2.0
+PyQt4-sip==4.19.24
+PyQt5==5.15.0
+PyQt5-sip==4.19.24
+pyrsistent==0.18.1
+pyshp==2.1.3
+PySocks==1.7.1
+pytest==6.2.2
+pytest-runner==4.0
+python-dateutil==2.8.2
+python-dmidecode==3.12.2
+python-linux-procfs==0.6.3
+pytz==2021.3
+pyudev==0.22.0
+PyYAML==5.4.1
+pyzmq==23.0.0
+rados==2.0.0
+rbd==2.0.0
+requests==2.25.1
+requests-file==1.5.1
+rgw==2.0.0
+rpm==4.16.1.3
+ruamel.yaml==0.16.6
+ruamel.yaml.clib==0.1.2
+ruptures==1.1.4
+scikit-learn==0.24.1
+scipy==1.6.2
+selinux==3.2
+Send2Trash==1.8.0
+sepolicy==3.2
+setools==4.4.0
+setroubleshoot==1.1
+Shapely==1.7.1
+simplegeneric==0.8.1
+six==1.15.0
+slip==0.6.4
+slip.dbus==0.6.4
+smartcols==0.3.0
+sniffio==1.2.0
+sos==4.1
+soupsieve==2.2.1
+SSSDConfig==2.5.2
+stack-data==0.2.0
+systemd-python==234
+tables==3.6.1
+terminado==0.15.0
+threadpoolctl==2.0.0
+tinycss2==1.1.1
+toml==0.10.2
+toolz==0.11.1
+tornado==6.1
+tqdm==4.62.1
+traitlets==5.2.2.post1
+typed-ast==1.4.3
+urllib3==1.25.10
+wcwidth==0.2.5
+webencodings==0.5.1
+websocket-client==1.3.2
+wrapt==1.12.1
+xarray==0.19.0
+xlrd==2.0.1
+xlwt==1.3.0
+xmltodict==0.12.0
+zipp==3.8.0
diff --git a/tools/doms/README.md b/tools/doms/README.md
deleted file mode 100644
index c49fa4a..0000000
--- a/tools/doms/README.md
+++ /dev/null
@@ -1,66 +0,0 @@
-# doms_reader.py
-The functions in doms_reader.py read a DOMS netCDF file into memory, assemble a list of matches of satellite and in situ data, and optionally output the matches to a CSV file. Each matched pair contains one satellite data record and one in situ data record.
-
-The DOMS netCDF files hold satellite data and in situ data in different groups (`SatelliteData` and `InsituData`). The `matchIDs` netCDF variable contains pairs of IDs (matches) which reference a satellite data record and an in situ data record in their respective groups. These records have a many-to-many relationship; one satellite record may match to many in situ records, and one in situ record may match to many satellite records. The `assemble_matches` function assembles the individua [...]
-
-## Requirements
-This tool was developed and tested with Python 2.7.5 and 3.7.0a0.
-Imported packages:
-* argparse
-* netcdf4
-* sys
-* datetime
-* csv
-* collections
-* logging
-
-
-## Functions
-### Function: `assemble_matches(filename)`
-Read a DOMS netCDF file into memory and return a list of matches from the file.
-
-#### Parameters
-- `filename` (str): the DOMS netCDF file name.
-
-#### Returns
-- `matches` (list): List of matches.
-
-Each list element in `matches` is a dictionary organized as follows:
- For match `m`, netCDF group `GROUP` ('SatelliteData' or 'InsituData'), and netCDF group variable `VARIABLE`:
-
-`matches[m][GROUP]['matchID']`: netCDF `MatchedRecords` dimension ID for the match
-`matches[m][GROUP]['GROUPID']`: GROUP netCDF `dim` dimension ID for the record
-`matches[m][GROUP][VARIABLE]`: variable value
-
-For example, to access the timestamps of the satellite data and the in situ data of the first match in the list, along with the `MatchedRecords` dimension ID and the groups' `dim` dimension ID:
-```python
-matches[0]['SatelliteData']['time']
-matches[0]['InsituData']['time']
-matches[0]['SatelliteData']['matchID']
-matches[0]['SatelliteData']['SatelliteDataID']
-matches[0]['InsituData']['InsituDataID']
-```
-
-
-### Function: `matches_to_csv(matches, csvfile)`
-Write the DOMS matches to a CSV file. Include a header of column names which are based on the group and variable names from the netCDF file.
-
-#### Parameters:
-- `matches` (list): the list of dictionaries containing the DOMS matches as returned from the `assemble_matches` function.
-- `csvfile` (str): the name of the CSV output file.
-
-## Usage
-For example, to read some DOMS netCDF file called `doms_file.nc`:
-### Command line
-The main function for `doms_reader.py` takes one `filename` parameter (`doms_file.nc` argument in this example) for the DOMS netCDF file to read, calls the `assemble_matches` function, then calls the `matches_to_csv` function to write the matches to a CSV file `doms_matches.csv`.
-```
-python doms_reader.py doms_file.nc
-```
-```
-python3 doms_reader.py doms_file.nc
-```
-### Importing `assemble_matches`
-```python
-from doms_reader import assemble_matches
-matches = assemble_matches('doms_file.nc')
-```
diff --git a/tools/doms/doms_reader.py b/tools/doms/doms_reader.py
deleted file mode 100644
index 7c614ce..0000000
--- a/tools/doms/doms_reader.py
+++ /dev/null
@@ -1,144 +0,0 @@
-# Licensed to the Apache Software Foundation (ASF) under one or more
-# contributor license agreements. See the NOTICE file distributed with
-# this work for additional information regarding copyright ownership.
-# The ASF licenses this file to You under the Apache License, Version 2.0
-# (the "License"); you may not use this file except in compliance with
-# the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import argparse
-from netCDF4 import Dataset, num2date
-import sys
-import datetime
-import csv
-from collections import OrderedDict
-import logging
-
-LOGGER = logging.getLogger("doms_reader")
-
-def assemble_matches(filename):
- """
- Read a DOMS netCDF file and return a list of matches.
-
- Parameters
- ----------
- filename : str
- The DOMS netCDF file name.
-
- Returns
- -------
- matches : list
- List of matches. Each list element is a dictionary.
- For match m, netCDF group GROUP (SatelliteData or InsituData), and
- group variable VARIABLE:
- matches[m][GROUP]['matchID']: MatchedRecords dimension ID for the match
- matches[m][GROUP]['GROUPID']: GROUP dim dimension ID for the record
- matches[m][GROUP][VARIABLE]: variable value
- """
-
- try:
- # Open the netCDF file
- with Dataset(filename, 'r') as doms_nc:
- # Check that the number of groups is consistent w/ the MatchedGroups
- # dimension
- assert len(doms_nc.groups) == doms_nc.dimensions['MatchedGroups'].size,\
- ("Number of groups isn't the same as MatchedGroups dimension.")
-
- matches = []
- matched_records = doms_nc.dimensions['MatchedRecords'].size
-
- # Loop through the match IDs to assemble matches
- for match in range(0, matched_records):
- match_dict = OrderedDict()
- # Grab the data from each platform (group) in the match
- for group_num, group in enumerate(doms_nc.groups):
- match_dict[group] = OrderedDict()
- match_dict[group]['matchID'] = match
- ID = doms_nc.variables['matchIDs'][match][group_num]
- match_dict[group][group + 'ID'] = ID
- for var in list(doms_nc.groups[group].variables.keys()):
- match_dict[group][var] = doms_nc.groups[group][var][ID]
-
- # Create a UTC datetime field from timestamp
- dt = num2date(match_dict[group]['time'],
- doms_nc.groups[group]['time'].units)
- match_dict[group]['datetime'] = dt
- LOGGER.info(match_dict)
- matches.append(match_dict)
-
- return matches
- except (OSError, IOError) as err:
- LOGGER.exception("Error reading netCDF file " + filename)
- raise err
-
-def matches_to_csv(matches, csvfile):
- """
- Write the DOMS matches to a CSV file. Include a header of column names
- which are based on the group and variable names from the netCDF file.
-
- Parameters
- ----------
- matches : list
- The list of dictionaries containing the DOMS matches as returned from
- assemble_matches.
- csvfile : str
- The name of the CSV output file.
- """
- # Create a header for the CSV. Column names are GROUP_VARIABLE or
- # GROUP_GROUPID.
- header = []
- for key, value in list(matches[0].items()):
- for otherkey in list(value.keys()):
- header.append(key + "_" + otherkey)
-
- try:
- # Write the CSV file
- with open(csvfile, 'w') as output_file:
- csv_writer = csv.writer(output_file)
- csv_writer.writerow(header)
- for match in matches:
- row = []
- for group, data in list(match.items()):
- for value in list(data.values()):
- row.append(value)
- csv_writer.writerow(row)
- except (OSError, IOError) as err:
- LOGGER.exception("Error writing CSV file " + csvfile)
- raise err
-
-if __name__ == '__main__':
- """
- Execution:
- python doms_reader.py filename
- OR
- python3 doms_reader.py filename
- """
- logging.basicConfig(format='%(asctime)s %(levelname)-8s %(message)s',
- level=logging.INFO,
- datefmt='%Y-%m-%d %H:%M:%S')
-
- p = argparse.ArgumentParser()
- p.add_argument('filename', help='DOMS netCDF file to read')
- args = p.parse_args()
-
- doms_matches = assemble_matches(args.filename)
-
- matches_to_csv(doms_matches, 'doms_matches.csv')
-
-
-
-
-
-
-
-
-
-
-
\ No newline at end of file