You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Chris Mattmann <ma...@apache.org> on 2015/04/10 19:09:29 UTC

Re: [nsf-polar-usc-students] Metadata & data representation for DIF files

Pushing this to dev@tika.a.o


-----Original Message-----
From: Lewis John Mcgibbney <le...@gmail.com>
Date: Friday, April 10, 2015 at 10:00 AM
To: Annie Bryant <an...@gmail.com>
Cc: Aakarsh MHM <aa...@gmail.com>, Chris Mattmann
<Ch...@jpl.nasa.gov>, Chris Mattmann
<ch...@gmail.com>, NSF Polar CyberInfrastructure DR Students
<ns...@googlegroups.com>
Subject: Re: [nsf-polar-usc-students] Metadata & data representation for
DIF files

>Hi Aakarsh,
>
>If you look into the XSD that you posted there is a reasonable chance
>that you can decipher what is meant to be metadata and what is not.
>
>I would start by possibly considering the following
>
>* collapse all of the child elements so you can navigate up and down the
>top most XSD child elements
>* navigate to the <xs:element name="Project"></xs:element> tag
>* I would begin by taking everything down of this tag as being
>metadata... this may be an incorrect assumption however you will see a
>number of fields present which indicate metdata.
>
>As Annie stated, a goo dplace would be th Tika lists as well... you will
>get good feedback.
>Lewis
>
>
>
>On Fri, Apr 10, 2015 at 9:35 AM, Annie Burgess <an...@gmail.com>
>wrote:
>
>Hi Aakarsh, 
>Looks like you are on the right path.  I'll+1 C. Mattmann, this is a
>great question to pose to dev@tika.apache.org.  You will get some good
>feedback and open up the conversation about science metadata format!
>
>Annie
>
>
>On Thu, Apr 9, 2015 at 11:03 PM, Aakarsh MHM <aa...@gmail.com>
>wrote:
>
>Hi Annie,
>I am working on parsing GCMD DIF files in Tika for NSF Polar data.
>DIF files comply XML schema
>(http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.8.4.xsd). They follow
>the metadata described here:
>http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
>
>Sample dif files:
>https://www.aoncadis.org/dataset/active_layer_nims_grid_atqasuk_alaska_201
>1.dif
>
>https://www.aoncadis.org/dataset/Zamora2010.dif
>
>
>I am using SAX parser to parse DIF files. Can you please help me with
>output representation of the parsed data and classifying metadata from
>data?
>For now I am considering key as tags hierarchy and the leaf tags data as
>value.
>
>For example,
><DIF>
>    <parameters>
>        <Category>Earth science</Category>
>        <Topic>xyz</Topic>
>    </parameters>
><DIF>
>
>Output:
>key: value.
>DIF-parameters-Category : Earth science
>DIF-parameters-Topic : xyz
>
>Regarding classifying metadata. I was thinking of making the mandatory
>fields from http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf as metadata
>and the rest as data.
>
>Any inputs will be appreciated.
>
>
>Thanks,
>Aakarsh
>
>
>-- 
>You received this message because you are subscribed to the Google Groups
>"nsf-polar-usc-students" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to nsf-polar-usc-students+unsubscribe@googlegroups.com.
>To post to this group, send email to
>nsf-polar-usc-students@googlegroups.com.
>Visit this group at http://groups.google.com/group/nsf-polar-usc-students.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/nsf-polar-usc-students/CAG%2BLkcv5_75EFx
>8fnJhLQvwt%2BTqDgU4dheYrzo9CcZObv_aa2Q%40mail.gmail.com
><https://groups.google.com/d/msgid/nsf-polar-usc-students/CAG%2BLkcv5_75EF
>x8fnJhLQvwt%2BTqDgU4dheYrzo9CcZObv_aa2Q%40mail.gmail.com?utm_medium=email&
>utm_source=footer>.
>For more options, visit https://groups.google.com/d/optout.
>
>
>
>
>
>
>-- 
>------------------------------------------------------
>Ann Bryant Burgess, PhD
>Postdoctoral Fellow
>Computer Science Department
>Viterbi School of Engineering
>University of Southern California
>                  
>Phone:  (585) 738-7549 <tel:%28585%29%20738-7549>
>------------------------------------------------------
>
>
>
>
>
>
>
>
>-- 
>You received this message because you are subscribed to the Google Groups
>"nsf-polar-usc-students" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to nsf-polar-usc-students+unsubscribe@googlegroups.com.
>To post to this group, send email to
>nsf-polar-usc-students@googlegroups.com.
>Visit this group at http://groups.google.com/group/nsf-polar-usc-students.
>To view this discussion on the web visit
>https://groups.google.com/d/msgid/nsf-polar-usc-students/CACYkAgYFfEcGdVX1
>-n-Fqvfneq_Jy8%3DmRyaKf7E5U%3DA_9EGVKw%40mail.gmail.com
><https://groups.google.com/d/msgid/nsf-polar-usc-students/CACYkAgYFfEcGdVX
>1-n-Fqvfneq_Jy8%3DmRyaKf7E5U%3DA_9EGVKw%40mail.gmail.com?utm_medium=email&
>utm_source=footer>.
>For more options, visit https://groups.google.com/d/optout.
>
>
>
>
>
>
>-- 
>Lewis 
>
>



Re: [nsf-polar-usc-students] Metadata & data representation for DIF files

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
I would say most of the fields are metadata, EXCEPT for:

geographic coordinates
dataset description (arguably data too)
dataset title
dataset contact 

Cheers,
Chris

P.S. Just got off hangout with Gautham, so FYI.

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Aakarsh MHM <aa...@gmail.com>
Date: Friday, April 10, 2015 at 5:33 PM
To: Chris Mattmann <ma...@apache.org>
Cc: Lewis John Mcgibbney <le...@gmail.com>, Annie Bryant
<an...@gmail.com>, Chris Mattmann
<Ch...@jpl.nasa.gov>, NSF Polar CyberInfrastructure DR Students
<ns...@googlegroups.com>, "dev@tika.apache.org"
<de...@tika.apache.org>, "gautham.g44@gmail.com" <ga...@gmail.com>
Subject: Re: [nsf-polar-usc-students] Metadata & data representation for
DIF files

>Thanks Annie & Lewis.
>
>
>As Lewis suggested I looked into the xsd. I am not sure if I have
>background knowledge to classify fields as metadata & data.
>However, from the definition (both general and field specific) given in
>http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
><http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf>, to me it seems like
>all the fields (39) in DIF can be classified as metadata.
>
>
>I also compared it with NetCDF file format where the structure it self
>clearly defines what is metadata and what is data. But same is not the
>case with DIF.
>
>
>Gautham is working on ISO 19139 parser which I believe falls into the
>same category as DIF. He believes all the defined fields for ISO 19139
>are also metadata. I am adding him to the discussion.
>
>
>Regards,
>Aakarsh Medleri Hire Math
>Graduate Student
>University of Southern California
>
>
>
>On Fri, Apr 10, 2015 at 10:09 AM, Chris Mattmann
><ma...@apache.org> wrote:
>
>Pushing this to dev@tika.a.o
>
>
>-----Original Message-----
>From: Lewis John Mcgibbney <le...@gmail.com>
>Date: Friday, April 10, 2015 at 10:00 AM
>To: Annie Bryant <an...@gmail.com>
>Cc: Aakarsh MHM <aa...@gmail.com>, Chris Mattmann
><Ch...@jpl.nasa.gov>, Chris Mattmann
><ch...@gmail.com>, NSF Polar CyberInfrastructure DR Students
><ns...@googlegroups.com>
>Subject: Re: [nsf-polar-usc-students] Metadata & data representation for
>DIF files
>
>>Hi Aakarsh,
>>
>>If you look into the XSD that you posted there is a reasonable chance
>>that you can decipher what is meant to be metadata and what is not.
>>
>>I would start by possibly considering the following
>>
>>* collapse all of the child elements so you can navigate up and down the
>>top most XSD child elements
>>* navigate to the <xs:element name="Project"></xs:element> tag
>>* I would begin by taking everything down of this tag as being
>>metadata... this may be an incorrect assumption however you will see a
>>number of fields present which indicate metdata.
>>
>>As Annie stated, a goo dplace would be th Tika lists as well... you will
>>get good feedback.
>>Lewis
>>
>>
>>
>>On Fri, Apr 10, 2015 at 9:35 AM, Annie Burgess <an...@gmail.com>
>>wrote:
>>
>>Hi Aakarsh,
>>Looks like you are on the right path.  I'll+1 C. Mattmann, this is a
>>great question to pose to dev@tika.apache.org.  You will get some good
>>feedback and open up the conversation about science metadata format!
>>
>>Annie
>>
>>
>>On Thu, Apr 9, 2015 at 11:03 PM, Aakarsh MHM <aa...@gmail.com>
>>wrote:
>>
>>Hi Annie,
>>I am working on parsing GCMD DIF files in Tika for NSF Polar data.
>>DIF files comply XML schema
>>(http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.8.4.xsd). They follow
>>the metadata described here:
>>http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
>>
>>Sample dif files:
>>https://www.aoncadis.org/dataset/active_layer_nims_grid_atqasuk_alaska_20
>>1
>>1.dif
>>
>>https://www.aoncadis.org/dataset/Zamora2010.dif
>>
>>
>>I am using SAX parser to parse DIF files. Can you please help me with
>>output representation of the parsed data and classifying metadata from
>>data?
>>For now I am considering key as tags hierarchy and the leaf tags data as
>>value.
>>
>>For example,
>><DIF>
>>    <parameters>
>>        <Category>Earth science</Category>
>>        <Topic>xyz</Topic>
>>    </parameters>
>><DIF>
>>
>>Output:
>>key: value.
>>DIF-parameters-Category : Earth science
>>DIF-parameters-Topic : xyz
>>
>>Regarding classifying metadata. I was thinking of making the mandatory
>>fields from 
>http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
><http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf> as metadata
>>and the rest as data.
>>
>>Any inputs will be appreciated.
>>
>>
>>Thanks,
>>Aakarsh
>>
>>
>>--
>>You received this message because you are subscribed to the Google Groups
>>"nsf-polar-usc-students" group.
>>To unsubscribe from this group and stop receiving emails from it, send an
>>email to 
>nsf-polar-usc-students+unsubscribe@googlegroups.com
><ma...@googlegroups.com>.
>>To post to this group, send email to
>>nsf-polar-usc-students@googlegroups.com.
>>Visit this group at
>http://groups.google.com/group/nsf-polar-usc-students
><http://groups.google.com/group/nsf-polar-usc-students>.
>>To view this discussion on the web visit
>>https://groups.google.com/d/msgid/nsf-polar-usc-students/CAG%2BLkcv5_75EF
>>x
>>8fnJhLQvwt%2BTqDgU4dheYrzo9CcZObv_aa2Q%40mail.gmail.com
>><http://40mail.gmail.com>
>
>
>><https://groups.google.com/d/msgid/nsf-polar-usc-students/CAG%2BLkcv5_75E
>>F
>>x8fnJhLQvwt%2BTqDgU4dheYrzo9CcZObv_aa2Q%40mail.gmail.com?utm_medium=email
>>& <http://40mail.gmail.com?utm_medium=email&>
>>utm_source=footer>.
>>For more options, visit
>https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>>
>>
>>
>>
>>
>>
>>--
>>------------------------------------------------------
>>Ann Bryant Burgess, PhD
>>Postdoctoral Fellow
>>Computer Science Department
>>Viterbi School of Engineering
>>University of Southern California
>>
>>Phone:  (585) 738-7549 <tel:%28585%29%20738-7549>
>><tel:%28585%29%20738-7549>
>>------------------------------------------------------
>>
>>
>>
>>
>>
>>
>>
>>
>>--
>>You received this message because you are subscribed to the Google Groups
>>"nsf-polar-usc-students" group.
>>To unsubscribe from this group and stop receiving emails from it, send an
>>email to 
>nsf-polar-usc-students+unsubscribe@googlegroups.com
><ma...@googlegroups.com>.
>>To post to this group, send email to
>>nsf-polar-usc-students@googlegroups.com.
>>Visit this group at
>http://groups.google.com/group/nsf-polar-usc-students
><http://groups.google.com/group/nsf-polar-usc-students>.
>>To view this discussion on the web visit
>>https://groups.google.com/d/msgid/nsf-polar-usc-students/CACYkAgYFfEcGdVX
>>1
>>-n-Fqvfneq_Jy8%3DmRyaKf7E5U%3DA_9EGVKw%40mail.gmail.com
>><http://40mail.gmail.com>
>><https://groups.google.com/d/msgid/nsf-polar-usc-students/CACYkAgYFfEcGdV
>>X
>>1-n-Fqvfneq_Jy8%3DmRyaKf7E5U%3DA_9EGVKw%40mail.gmail.com?utm_medium=email
>>& <http://40mail.gmail.com?utm_medium=email&>
>>utm_source=footer>.
>>For more options, visit
>https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
>>
>>
>>
>>
>>
>>
>>--
>>Lewis
>>
>>
>
>
>
>
>
>
>
>
>


Re: [nsf-polar-usc-students] Metadata & data representation for DIF files

Posted by Aakarsh MHM <aa...@gmail.com>.
Thanks Annie & Lewis.

As Lewis suggested I looked into the xsd. I am not sure if I have
background knowledge to classify fields as metadata & data.
However, from the definition (both general and field specific) given in
http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf, to me it seems like all
the fields (39) in DIF can be classified as metadata.

I also compared it with NetCDF file format where the structure it self
clearly defines what is metadata and what is data. But same is not the case
with DIF.

Gautham is working on ISO 19139 parser which I believe falls into the same
category as DIF. He believes all the defined fields for ISO 19139 are also
metadata. I am adding him to the discussion.

Regards,
Aakarsh Medleri Hire Math
Graduate Student
University of Southern California

On Fri, Apr 10, 2015 at 10:09 AM, Chris Mattmann <ma...@apache.org>
wrote:

> Pushing this to dev@tika.a.o
>
>
> -----Original Message-----
> From: Lewis John Mcgibbney <le...@gmail.com>
> Date: Friday, April 10, 2015 at 10:00 AM
> To: Annie Bryant <an...@gmail.com>
> Cc: Aakarsh MHM <aa...@gmail.com>, Chris Mattmann
> <Ch...@jpl.nasa.gov>, Chris Mattmann
> <ch...@gmail.com>, NSF Polar CyberInfrastructure DR Students
> <ns...@googlegroups.com>
> Subject: Re: [nsf-polar-usc-students] Metadata & data representation for
> DIF files
>
> >Hi Aakarsh,
> >
> >If you look into the XSD that you posted there is a reasonable chance
> >that you can decipher what is meant to be metadata and what is not.
> >
> >I would start by possibly considering the following
> >
> >* collapse all of the child elements so you can navigate up and down the
> >top most XSD child elements
> >* navigate to the <xs:element name="Project"></xs:element> tag
> >* I would begin by taking everything down of this tag as being
> >metadata... this may be an incorrect assumption however you will see a
> >number of fields present which indicate metdata.
> >
> >As Annie stated, a goo dplace would be th Tika lists as well... you will
> >get good feedback.
> >Lewis
> >
> >
> >
> >On Fri, Apr 10, 2015 at 9:35 AM, Annie Burgess <an...@gmail.com>
> >wrote:
> >
> >Hi Aakarsh,
> >Looks like you are on the right path.  I'll+1 C. Mattmann, this is a
> >great question to pose to dev@tika.apache.org.  You will get some good
> >feedback and open up the conversation about science metadata format!
> >
> >Annie
> >
> >
> >On Thu, Apr 9, 2015 at 11:03 PM, Aakarsh MHM <aa...@gmail.com>
> >wrote:
> >
> >Hi Annie,
> >I am working on parsing GCMD DIF files in Tika for NSF Polar data.
> >DIF files comply XML schema
> >(http://gcmd.gsfc.nasa.gov/Aboutus/xml/dif/dif_v9.8.4.xsd). They follow
> >the metadata described here:
> >http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
> >
> >Sample dif files:
> >
> https://www.aoncadis.org/dataset/active_layer_nims_grid_atqasuk_alaska_201
> >1.dif
> >
> >https://www.aoncadis.org/dataset/Zamora2010.dif
> >
> >
> >I am using SAX parser to parse DIF files. Can you please help me with
> >output representation of the parsed data and classifying metadata from
> >data?
> >For now I am considering key as tags hierarchy and the leaf tags data as
> >value.
> >
> >For example,
> ><DIF>
> >    <parameters>
> >        <Category>Earth science</Category>
> >        <Topic>xyz</Topic>
> >    </parameters>
> ><DIF>
> >
> >Output:
> >key: value.
> >DIF-parameters-Category : Earth science
> >DIF-parameters-Topic : xyz
> >
> >Regarding classifying metadata. I was thinking of making the mandatory
> >fields from http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf as metadata
> >and the rest as data.
> >
> >Any inputs will be appreciated.
> >
> >
> >Thanks,
> >Aakarsh
> >
> >
> >--
> >You received this message because you are subscribed to the Google Groups
> >"nsf-polar-usc-students" group.
> >To unsubscribe from this group and stop receiving emails from it, send an
> >email to nsf-polar-usc-students+unsubscribe@googlegroups.com.
> >To post to this group, send email to
> >nsf-polar-usc-students@googlegroups.com.
> >Visit this group at http://groups.google.com/group/nsf-polar-usc-students
> .
> >To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/nsf-polar-usc-students/CAG%2BLkcv5_75EFx
> >8fnJhLQvwt%2BTqDgU4dheYrzo9CcZObv_aa2Q%40mail.gmail.com
> ><
> https://groups.google.com/d/msgid/nsf-polar-usc-students/CAG%2BLkcv5_75EF
> >x8fnJhLQvwt%2BTqDgU4dheYrzo9CcZObv_aa2Q%
> 40mail.gmail.com?utm_medium=email&
> >utm_source=footer>.
> >For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> >
> >
> >
> >--
> >------------------------------------------------------
> >Ann Bryant Burgess, PhD
> >Postdoctoral Fellow
> >Computer Science Department
> >Viterbi School of Engineering
> >University of Southern California
> >
> >Phone:  (585) 738-7549 <tel:%28585%29%20738-7549>
> >------------------------------------------------------
> >
> >
> >
> >
> >
> >
> >
> >
> >--
> >You received this message because you are subscribed to the Google Groups
> >"nsf-polar-usc-students" group.
> >To unsubscribe from this group and stop receiving emails from it, send an
> >email to nsf-polar-usc-students+unsubscribe@googlegroups.com.
> >To post to this group, send email to
> >nsf-polar-usc-students@googlegroups.com.
> >Visit this group at http://groups.google.com/group/nsf-polar-usc-students
> .
> >To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/nsf-polar-usc-students/CACYkAgYFfEcGdVX1
> >-n-Fqvfneq_Jy8%3DmRyaKf7E5U%3DA_9EGVKw%40mail.gmail.com
> ><
> https://groups.google.com/d/msgid/nsf-polar-usc-students/CACYkAgYFfEcGdVX
> >1-n-Fqvfneq_Jy8%3DmRyaKf7E5U%3DA_9EGVKw%
> 40mail.gmail.com?utm_medium=email&
> >utm_source=footer>.
> >For more options, visit https://groups.google.com/d/optout.
> >
> >
> >
> >
> >
> >
> >--
> >Lewis
> >
> >
>
>
>