You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ann Burgess (JIRA)" <ji...@apache.org> on 2014/04/29 01:11:18 UTC

[jira] [Comment Edited] (TIKA-1274) ENVI header parser

    [ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13983597#comment-13983597 ] 

Ann Burgess edited comment on TIKA-1274 at 4/28/14 11:10 PM:
-------------------------------------------------------------

Hi Nick,

Thank you for the git repo tips.  I added the 'target' directory and I was
mimicking the directory structure of the tika build - consider it removed.
On that note, I'd appreciate any documentation on the dos and don'ts of
building a git repo for Tika or other Apache projects... if such
documentation exists.

As for the file contents, ENVI header
files<http://www.exelisvis.com/docs/ENVIHeaderFiles.html>are plain
text documents. The contents of the ENVI header files are, in
fact, metadata for a corresponding data file, i.e. to read a file named
some_file.img, it requires the corresponding file some_file.img.hdr.  In
other words, because the entire contents of a some_file.img.hdr file
is metadata for some_file.img, the actual contents of the some_file.img.hdr
file do NOT describe the .hdr file itself, rather they describe the .img
file.  That is why I didn't think it appropriate to move parts of the 'raw
content' into metadata.  Does that make sense?  I'm also very open to how
this sort of thing is normally treated or to open a conversation about the
topic of how to treat one file type describing another file type.

Thanks for the input and any further suggestions.



was (Author: annieburgess):
Hi Nick,

Thank you for the git repo tips.  I added the 'target' directory and I was
mimicking the directory structure of the tika build - consider it removed.
On that note, I'd appreciate any documentation on the dos and don'ts of
building a git repo for Tika or other Apache projects... if such
documentation exists.

As for the file contents, ENVI header
files<http://www.exelisvis.com/docs/ENVIHeaderFiles.html>are plain
text documents. The contents of the ENVI header files are, in
fact, metadata for a corresponding data file, i.e. to read a file named
some_file.img, it requires the corresponding file some_file.img.hdr.  In
other words, because the entire contents of a some_file.img.hdr file
is metadata for some_file.img, the actual contents of the some_file.img.hdr
file do NOT describe the .hdr file itself, rather they describe the .img
file.  That is why I didn't think it appropriate to move parts of the 'raw
content' into metadata.  Does that make sense?  I'm also very open to how
this sort of thing is normally treated or to open a conversation about the
topic of how to treat one file type describing another file type.

Thanks for the input and any further suggestions.








-- 
------------------------------------------------------------------------------------------
Ann Bryant Burgess, PhD

Postdoctoral Fellow
Computer Science Department
University of Southern California
Viterbi School of Engineering
Los Angeles, CA

Alaska Science Center/USGS
Anchorage, AK

Cell:  (585) 738-7549
Office:  (907) 786-7059
Fax:  (907) 786-7150
E-mail: anniebryant.burgess@gmail.com
Office Address: 4210 University Dr., Anchorage, AK 99508-4626
-------------------------------------------------------------------------------------------


> ENVI header parser
> ------------------
>
>                 Key: TIKA-1274
>                 URL: https://issues.apache.org/jira/browse/TIKA-1274
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Ann Burgess
>            Assignee: Chris A. Mattmann
>              Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>    Content-Encoding: ISO-8859-1
>    Content-Length: 818
>    Content-Type: application/envi.hdr
>    resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.000000, 0.0, 0.0, Sinusoidal, units=Meters}
> coordinate system string = {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> ______________
> As a current non-certified committer, could someone enlighten me to the steps needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)