You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2014/08/15 04:46:37 UTC
Support for HDF5 and netCDF
Hi Folks,
@Annie Brynant in particular,
I would like to have on list the current state of our support for Mime Types
* NetCDF4
* HDF5
I know that we maintain parsers for these types however I possibly have an
extensionuse case which I would like to discuss.
I am looking to ensure that I can obtain metadata defined by the Attribute
Conventions Dataset Discovery [0] effort. Please see elaboration below;
*Attribute Name*
*Type*
*Description*
*Example Implementation*
*date_created*
string
The date and time the data file was created in the form “yyyymmddThhmmssZ”.
This time format is ISO 8601 compliant.
Date_created = “2012-04-06T16:26:33Z”;
*time_coverage_start*
string
Representative date and time of the start of the granule in the ISO 8601
compliant format of “yyyymmddThhmmssZ”.
Time_coverage_start = “2012001013102483”
*time_coverage_start*
string
Representative date and time of the start of the granule in the ISO 8601
compliant format of “yyyymmddThhmmssZ”.
Time_coverage_end = “2012002000843304”
*geospatial_lat_max*
float
Decimal degrees north, range -90 to +90.
Geospatial_lat_max = 90.0f
*geospatial_lat_min*
float
Decimal degrees north, range -90 to +90.
Geospatial_lat_min = -90.0f
*geospatial_lon_max*
float
Decimal degrees east, range -180 to +180.
Geospatial_lon_max = -180.0f
*geospatial_lon_min*
float
Decimal degrees east, range -180 to +180.
Geospatial_lon_min = 180.0f
*geospatial_lat_resolution*
float
Latitude Resolution in units matching geospatial_lat_units.
Geospatial_lat_resolution = 1
*geospatial_lon_resolution*
float
Longitude Resolution in units matching geospatial_lon_units.
Geospatial_lon_resolution = 1
*geospatial_lat_units*
string
Units of the latitudinal resolution. Typically “degrees_north”
geospatial_lat_units = “degrees_north”
*geospatial_lon_units*
string
Units of the longitudinal resolution. Typically “degrees_east”
geospatial_lon_units = “degrees_east”
*platform*
string
Satellite(s) used to create this data file
platform: “Aquarius/SAC-D”
*sensor*
string
Sensor(s) used to create this data file.
Sensor = “Aquarius”
*project*
string
Project/mission name
project = “Aquarius”
*product_version*
string
The product version of this data file, which may be different than the file
version used in the file naming convention.
Product_version = “1.3"
*processing_level*
string
Product processing Level (eg. L2, L3, L4)
processing_level = 3
*keywords*
string
Comma sperated list of GCMD Science Keywords from
http://gcmd.nasa.gov/learn/keyword_list.html
keywords_vocabulary = "SURFACE SALINITY, SALINITY, AQUARIUS SAC-D"
and also the Climate Forecast (CF) metadata convention... which looks like
this
*Attribute Name*
*Type*
*Description*
*Example Implementation*
*Conventions*
string
Version of Convention standard implemented by the file, interpreted as a
directory name relative to a directory that is a repository of documents
describing sets of discipline-specific conventions
Conventions = "CF-1.6";
*title*
string
A succinct description of what is in the dataset.
title = "Aquarius CAP Level-3 1x1 Deg Gridded 7-Day Bin Averaged Maps";
*history*
string
Used to document Provenance. Provides an audit trail for modifications to
the original data. We recommend that each line begin with a timestamp
indicating the date and time of day that the program was executed.
history = "L2_1.3CAP2.1.4";
*institution*
string
Specifies where the original data was produced.
institution = "JPL";
*source*
string
The method of production of the original data. If it was model-generated,
source should name the model and its version, as specifically as could be
useful. If it is observational, source should characterize it (e.g.,
"surface observation" or "radiosonde").
source = "CAPV1.3-HDF5";
*comment*
string
Miscellaneous information about the data or methods used to produce it.
comment ="rolling 7 day means at 1 degree spatial resolution";
*references*
string
Published or web-based references that describe the data or methods used to
produce it.
references = "Yueh,S.,Tang,
W.,Fore,A.,Freedman,A.,Neumann,G.,Chaubell,J.,Hayashi,A (2012).SIMULTANEOUS
SALINITY AND WIND RETRIEVAL USING THE CAP ALGORITHM FOR AQUARIUS.
http://www.igarss2012.org/Papers/viewpapers.asp?papernum=1596";
[0]
http://wiki.esipfed.org/index.php/Category:Attribute_Conventions_Dataset_Discovery
--
*Lewis*
Re: Support for HDF5 and netCDF
Posted by Annie Burgess <an...@gmail.com>.
Hey Lewis,
Our current NetCDF parser supports the extraction of the metadata you are
looking for. The text output of the NetCDF parser provides all
'dimensions' and 'variables,' while the metadata output provides all
'attribute' information.
The HDF parser is much more limited. Would expanding the HDF parser be
something you'd put priority on? I'm currently working to expand the grib2
parsing capabilities, however grib2, netcdf, and hdf all use the ucar java
library, so it would be a convenient time to expand the hdf parser!
Annie
On Thu, Aug 14, 2014 at 6:46 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:
> Hi Folks,
>
> @Annie Brynant in particular,
>
> I would like to have on list the current state of our support for Mime
> Types
>
> * NetCDF4
> * HDF5
>
> I know that we maintain parsers for these types however I possibly have an
> extensionuse case which I would like to discuss.
> I am looking to ensure that I can obtain metadata defined by the Attribute
> Conventions Dataset Discovery [0] effort. Please see elaboration below;
>
> *Attribute Name*
>
> *Type*
>
> *Description*
>
> *Example Implementation*
>
> *date_created*
>
> string
>
> The date and time the data file was created in the form
> “yyyymmddThhmmssZ”. This time format is ISO 8601 compliant.
>
> Date_created = “2012-04-06T16:26:33Z”;
>
> *time_coverage_start*
>
> string
>
> Representative date and time of the start of the granule in the ISO 8601
> compliant format of “yyyymmddThhmmssZ”.
>
> Time_coverage_start = “2012001013102483”
>
> *time_coverage_start*
>
> string
>
> Representative date and time of the start of the granule in the ISO 8601
> compliant format of “yyyymmddThhmmssZ”.
>
> Time_coverage_end = “2012002000843304”
>
> *geospatial_lat_max*
>
> float
>
> Decimal degrees north, range -90 to +90.
>
> Geospatial_lat_max = 90.0f
>
> *geospatial_lat_min*
>
> float
>
> Decimal degrees north, range -90 to +90.
>
> Geospatial_lat_min = -90.0f
>
> *geospatial_lon_max*
>
> float
>
> Decimal degrees east, range -180 to +180.
>
> Geospatial_lon_max = -180.0f
>
> *geospatial_lon_min*
>
> float
>
> Decimal degrees east, range -180 to +180.
>
> Geospatial_lon_min = 180.0f
>
> *geospatial_lat_resolution*
>
> float
>
> Latitude Resolution in units matching geospatial_lat_units.
>
> Geospatial_lat_resolution = 1
>
> *geospatial_lon_resolution*
>
> float
>
> Longitude Resolution in units matching geospatial_lon_units.
>
> Geospatial_lon_resolution = 1
>
> *geospatial_lat_units*
>
> string
>
> Units of the latitudinal resolution. Typically “degrees_north”
>
> geospatial_lat_units = “degrees_north”
>
> *geospatial_lon_units*
>
> string
>
> Units of the longitudinal resolution. Typically “degrees_east”
>
> geospatial_lon_units = “degrees_east”
>
> *platform*
>
> string
>
> Satellite(s) used to create this data file
>
> platform: “Aquarius/SAC-D”
>
> *sensor*
>
> string
>
> Sensor(s) used to create this data file.
>
> Sensor = “Aquarius”
>
> *project*
>
> string
>
> Project/mission name
>
> project = “Aquarius”
>
> *product_version*
>
> string
>
> The product version of this data file, which may be different than the
> file version used in the file naming convention.
>
> Product_version = “1.3"
>
> *processing_level*
>
> string
>
> Product processing Level (eg. L2, L3, L4)
>
> processing_level = 3
>
> *keywords*
>
> string
>
> Comma sperated list of GCMD Science Keywords from
> http://gcmd.nasa.gov/learn/keyword_list.html
>
> keywords_vocabulary = "SURFACE SALINITY, SALINITY, AQUARIUS SAC-D"
>
> and also the Climate Forecast (CF) metadata convention... which looks like
> this
>
> *Attribute Name*
>
> *Type*
>
> *Description*
>
> *Example Implementation*
>
> *Conventions*
>
> string
>
> Version of Convention standard implemented by the file, interpreted as a
> directory name relative to a directory that is a repository of documents
> describing sets of discipline-specific conventions
>
> Conventions = "CF-1.6";
>
> *title*
>
> string
>
> A succinct description of what is in the dataset.
>
> title = "Aquarius CAP Level-3 1x1 Deg Gridded 7-Day Bin Averaged Maps";
>
> *history*
>
> string
>
> Used to document Provenance. Provides an audit trail for modifications to
> the original data. We recommend that each line begin with a timestamp
> indicating the date and time of day that the program was executed.
>
> history = "L2_1.3CAP2.1.4";
>
> *institution*
>
> string
>
> Specifies where the original data was produced.
>
> institution = "JPL";
>
> *source*
>
> string
>
> The method of production of the original data. If it was model-generated,
> source should name the model and its version, as specifically as could be
> useful. If it is observational, source should characterize it (e.g.,
> "surface observation" or "radiosonde").
>
> source = "CAPV1.3-HDF5";
>
> *comment*
>
> string
>
> Miscellaneous information about the data or methods used to produce it.
>
> comment ="rolling 7 day means at 1 degree spatial resolution";
>
> *references*
>
> string
>
> Published or web-based references that describe the data or methods used
> to produce it.
>
> references = "Yueh,S.,Tang,
> W.,Fore,A.,Freedman,A.,Neumann,G.,Chaubell,J.,Hayashi,A (2012).SIMULTANEOUS
> SALINITY AND WIND RETRIEVAL USING THE CAP ALGORITHM FOR AQUARIUS.
> http://www.igarss2012.org/Papers/viewpapers.asp?papernum=1596";
>
>
> [0]
> http://wiki.esipfed.org/index.php/Category:Attribute_Conventions_Dataset_Discovery
>
> --
> *Lewis*
>
--
------------------------------------------------------------------------------------------
Ann Bryant Burgess, PhD
Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA
Alaska Science Center/USGS
Anchorage, AK
Cell: (585) 738-7549
Office: (907) 786-7059
Fax: (907) 786-7150
E-mail: anniebryant.burgess@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
-------------------------------------------------------------------------------------------