You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2014/08/15 04:46:37 UTC

Support for HDF5 and netCDF

Hi Folks,

@Annie Brynant in particular,

I would like to have on list the current state of our support for Mime Types

 * NetCDF4
 * HDF5

I know that we maintain parsers for these types however I possibly have an
extensionuse case which I would like to discuss.
I am looking to ensure that I can obtain metadata defined by the Attribute
Conventions Dataset Discovery [0] effort. Please see elaboration below;

*Attribute Name*

*Type*

*Description*

*Example Implementation*

*date_created*

string

The date and time the data file was created in the form “yyyymmddThhmmssZ”.
This time format is ISO 8601 compliant.

Date_created = “2012-04-06T16:26:33Z”;

*time_coverage_start*

string

Representative date and time of the start of the granule in the ISO 8601
compliant format of “yyyymmddThhmmssZ”.

Time_coverage_start = “2012001013102483”

*time_coverage_start*

string

Representative date and time of the start of the granule in the ISO 8601
compliant format of “yyyymmddThhmmssZ”.

Time_coverage_end = “2012002000843304”

*geospatial_lat_max*

float

Decimal degrees north, range -90 to +90.

Geospatial_lat_max = 90.0f

*geospatial_lat_min*

float

Decimal degrees north, range -90 to +90.

Geospatial_lat_min = -90.0f

*geospatial_lon_max*

float

Decimal degrees east,  range -180 to +180.

Geospatial_lon_max = -180.0f

*geospatial_lon_min*

float

Decimal degrees east,  range -180 to +180.

Geospatial_lon_min = 180.0f

*geospatial_lat_resolution*

float

Latitude Resolution in units matching geospatial_lat_units.

Geospatial_lat_resolution = 1

*geospatial_lon_resolution*

float

Longitude Resolution in units matching geospatial_lon_units.

Geospatial_lon_resolution = 1

*geospatial_lat_units*

string

Units of the latitudinal resolution. Typically “degrees_north”

geospatial_lat_units = “degrees_north”

*geospatial_lon_units*

string

Units of the longitudinal resolution. Typically “degrees_east”

geospatial_lon_units = “degrees_east”

*platform*

string

Satellite(s) used to create this data file

platform: “Aquarius/SAC-D”

*sensor*

string

Sensor(s) used to create this data file.

Sensor = “Aquarius”

*project*

string

Project/mission name

project = “Aquarius”

*product_version*

string

The product version of this data file, which may be different than the file
version used in the file naming convention.

Product_version = “1.3"

*processing_level*

string

Product processing Level (eg. L2, L3, L4)

processing_level = 3

*keywords*

string

Comma sperated list of GCMD Science Keywords from
http://gcmd.nasa.gov/learn/keyword_list.html

keywords_vocabulary = "SURFACE SALINITY, SALINITY,  AQUARIUS SAC-D"

and also the Climate Forecast (CF) metadata convention... which looks like
this

*Attribute Name*

*Type*

*Description*

*Example Implementation*

*Conventions*

string

Version of Convention standard implemented by the file,  interpreted as a
directory name relative to a directory that is a repository of documents
describing sets of discipline-specific conventions

Conventions = "CF-1.6";

*title*

string

A succinct description of what is in the dataset.

title = "Aquarius CAP Level-3 1x1 Deg Gridded 7-Day Bin Averaged Maps";

*history*

string

Used to document Provenance.  Provides an audit trail for modifications to
the original data. We recommend that each line begin with a timestamp
indicating the date and time of day that the program was executed.

history = "L2_1.3CAP2.1.4";

*institution*

string

Specifies where the original data was produced.

institution = "JPL";

*source*

string

The method of production of the original data. If it was model-generated,
source should name the model and its version, as specifically as could be
useful. If it is observational, source should characterize it (e.g.,
"surface observation" or "radiosonde").

source = "CAPV1.3-HDF5";

*comment*

string

Miscellaneous information about the data or methods used to produce it.

comment ="rolling 7 day means at 1 degree spatial resolution";

*references*

string

Published or web-based references that describe the data or methods used to
produce it.

references = "Yueh,S.,Tang,
W.,Fore,A.,Freedman,A.,Neumann,G.,Chaubell,J.,Hayashi,A (2012).SIMULTANEOUS
SALINITY AND WIND RETRIEVAL USING THE CAP ALGORITHM FOR AQUARIUS.
http://www.igarss2012.org/Papers/viewpapers.asp?papernum=1596";


[0]
http://wiki.esipfed.org/index.php/Category:Attribute_Conventions_Dataset_Discovery

-- 
*Lewis*

Re: Support for HDF5 and netCDF

Posted by Annie Burgess <an...@gmail.com>.
Hey Lewis,

Our current NetCDF parser supports the extraction of the metadata you are
looking for.  The text output of the NetCDF parser provides all
'dimensions' and 'variables,' while the metadata output provides all
'attribute' information.

The HDF parser is much more limited.  Would expanding the HDF parser be
something you'd put priority on?  I'm currently working to expand the grib2
parsing capabilities, however grib2, netcdf, and hdf all use the ucar java
library, so it would be a convenient time to expand the hdf parser!

Annie





On Thu, Aug 14, 2014 at 6:46 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi Folks,
>
> @Annie Brynant in particular,
>
> I would like to have on list the current state of our support for Mime
> Types
>
>  * NetCDF4
>  * HDF5
>
> I know that we maintain parsers for these types however I possibly have an
> extensionuse case which I would like to discuss.
> I am looking to ensure that I can obtain metadata defined by the Attribute
> Conventions Dataset Discovery [0] effort. Please see elaboration below;
>
> *Attribute Name*
>
> *Type*
>
> *Description*
>
> *Example Implementation*
>
> *date_created*
>
> string
>
> The date and time the data file was created in the form
> “yyyymmddThhmmssZ”. This time format is ISO 8601 compliant.
>
> Date_created = “2012-04-06T16:26:33Z”;
>
> *time_coverage_start*
>
> string
>
> Representative date and time of the start of the granule in the ISO 8601
> compliant format of “yyyymmddThhmmssZ”.
>
> Time_coverage_start = “2012001013102483”
>
> *time_coverage_start*
>
> string
>
> Representative date and time of the start of the granule in the ISO 8601
> compliant format of “yyyymmddThhmmssZ”.
>
> Time_coverage_end = “2012002000843304”
>
> *geospatial_lat_max*
>
> float
>
> Decimal degrees north, range -90 to +90.
>
> Geospatial_lat_max = 90.0f
>
> *geospatial_lat_min*
>
> float
>
> Decimal degrees north, range -90 to +90.
>
> Geospatial_lat_min = -90.0f
>
> *geospatial_lon_max*
>
> float
>
> Decimal degrees east,  range -180 to +180.
>
> Geospatial_lon_max = -180.0f
>
> *geospatial_lon_min*
>
> float
>
> Decimal degrees east,  range -180 to +180.
>
> Geospatial_lon_min = 180.0f
>
> *geospatial_lat_resolution*
>
> float
>
> Latitude Resolution in units matching geospatial_lat_units.
>
> Geospatial_lat_resolution = 1
>
> *geospatial_lon_resolution*
>
> float
>
> Longitude Resolution in units matching geospatial_lon_units.
>
> Geospatial_lon_resolution = 1
>
> *geospatial_lat_units*
>
> string
>
> Units of the latitudinal resolution. Typically “degrees_north”
>
> geospatial_lat_units = “degrees_north”
>
> *geospatial_lon_units*
>
> string
>
> Units of the longitudinal resolution. Typically “degrees_east”
>
> geospatial_lon_units = “degrees_east”
>
> *platform*
>
> string
>
> Satellite(s) used to create this data file
>
> platform: “Aquarius/SAC-D”
>
> *sensor*
>
> string
>
> Sensor(s) used to create this data file.
>
> Sensor = “Aquarius”
>
> *project*
>
> string
>
> Project/mission name
>
> project = “Aquarius”
>
> *product_version*
>
> string
>
> The product version of this data file, which may be different than the
> file version used in the file naming convention.
>
> Product_version = “1.3"
>
> *processing_level*
>
> string
>
> Product processing Level (eg. L2, L3, L4)
>
> processing_level = 3
>
> *keywords*
>
> string
>
> Comma sperated list of GCMD Science Keywords from
> http://gcmd.nasa.gov/learn/keyword_list.html
>
> keywords_vocabulary = "SURFACE SALINITY, SALINITY,  AQUARIUS SAC-D"
>
> and also the Climate Forecast (CF) metadata convention... which looks like
> this
>
> *Attribute Name*
>
> *Type*
>
> *Description*
>
> *Example Implementation*
>
> *Conventions*
>
> string
>
> Version of Convention standard implemented by the file,  interpreted as a
> directory name relative to a directory that is a repository of documents
> describing sets of discipline-specific conventions
>
> Conventions = "CF-1.6";
>
> *title*
>
> string
>
> A succinct description of what is in the dataset.
>
> title = "Aquarius CAP Level-3 1x1 Deg Gridded 7-Day Bin Averaged Maps";
>
> *history*
>
> string
>
> Used to document Provenance.  Provides an audit trail for modifications to
> the original data. We recommend that each line begin with a timestamp
> indicating the date and time of day that the program was executed.
>
> history = "L2_1.3CAP2.1.4";
>
> *institution*
>
> string
>
> Specifies where the original data was produced.
>
> institution = "JPL";
>
> *source*
>
> string
>
> The method of production of the original data. If it was model-generated,
> source should name the model and its version, as specifically as could be
> useful. If it is observational, source should characterize it (e.g.,
> "surface observation" or "radiosonde").
>
> source = "CAPV1.3-HDF5";
>
> *comment*
>
> string
>
> Miscellaneous information about the data or methods used to produce it.
>
> comment ="rolling 7 day means at 1 degree spatial resolution";
>
> *references*
>
> string
>
> Published or web-based references that describe the data or methods used
> to produce it.
>
> references = "Yueh,S.,Tang,
> W.,Fore,A.,Freedman,A.,Neumann,G.,Chaubell,J.,Hayashi,A (2012).SIMULTANEOUS
> SALINITY AND WIND RETRIEVAL USING THE CAP ALGORITHM FOR AQUARIUS.
> http://www.igarss2012.org/Papers/viewpapers.asp?papernum=1596";
>
>
> [0]
> http://wiki.esipfed.org/index.php/Category:Attribute_Conventions_Dataset_Discovery
>
> --
> *Lewis*
>



-- 
------------------------------------------------------------------------------------------
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burgess@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
-------------------------------------------------------------------------------------------