You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/04/16 23:26:14 UTC

[jira] [Commented] (TIKA-1274) ENVI header parser

    [ https://issues.apache.org/jira/browse/TIKA-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971978#comment-13971978 ] 

Nick Burch commented on TIKA-1274:
----------------------------------

If this were changes to existing files, we'd need a patch file for the changes to review

As it's all new files, what we'd need attaching to the ticket are:
 * The custom-mimetypes file that defines your new format
 * The parser java file(s)
 * A sample ENVI header file
 * A unit test file that tests the detection and parsing
 * Details of any new dependencies (if any)

For general advice on contributing, patches, tests etc, the Apache Nutch project has some good wiki pages describing all of that, most of which will apply equally to Apache Tika too:
 * https://wiki.apache.org/nutch/HowToContribute
 * https://wiki.apache.org/nutch/Becoming_A_Nutch_Developer

Another good source is the ComDev (Apache Community Development) site - pick "For Contributors" from the menu and look through the pages in that section

For an example of a simple Tika parser + simple Tika parser unit test, I can suggest the VorbisParser from late 2011, when it largely only supported the one file (Ogg Vorbis), before additional Ogg based formats were added in. You can see that at something like https://github.com/Gagravarr/VorbisJava/tree/f6d20407477011735c16daf947635f1b67e14660/tika

> ENVI header parser
> ------------------
>
>                 Key: TIKA-1274
>                 URL: https://issues.apache.org/jira/browse/TIKA-1274
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Ann Burgess
>              Labels: mime, newbie, parser, patch
>
> I have written a parser that extracts text and metadata from ENVI header files, currently called at the command line as: 
> abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --metadata MOD09GA_test_header.hdr
>    Content-Encoding: ISO-8859-1
>    Content-Length: 818
>    Content-Type: application/envi.hdr
>    resourceName: MOD09GA_test_header.hdr
> abryant:tika abryant$ java -classpath annie-envi-parser.jar:tika-app/target/tika-app-1.6-SNAPSHOT.jar org.apache.tika.cli.TikaCLI --text MOD09GA_test_header.hdr
> ENVI
> description = {
>   GEO-TIFF File Imported into ENVI [Fri May 25 14:06:23 2012]}
> samples = 2400
> lines   = 2400
> bands   = 7
> header offset = 0
> file type = ENVI Standard
> data type = 2
> interleave = bip
> sensor type = Unknown
> byte order = 0
> map info = {Sinusoidal, 1.5000, 1.5000, -10007091.3643, 5559289.2856, 4.6331271653e+02, 4.6331271653e+02, , units=Meters}
> projection info = {16, 6371007.2, 0.000000, 0.0, 0.0, Sinusoidal, units=Meters}
> coordinate system string = {PROJCS["Sinusoidal",GEOGCS["GCS_ELLIPSE_BASED_1",DATUM["D_ELLIPSE_BASED_1",SPHEROID["S_ELLIPSE_BASED_1",6371007.181,0.0]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Sinusoidal"],PARAMETER["False_Easting",0.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",0.0],UNIT["Meter",1.0]]}
> wavelength units = Unknown
> ______________
> As a current non-certified committer, could someone enlighten me to the steps needed to submit this new parser for review.  
> The parser is located in my directory structure as: 
> /users/annbryant/tika/tika/anniedev/src/main/java/edu/usc/sunset/abburgess/tika/EnviFileReader.class
> My custom mimetypes.xml file is located at: /Users/annbryant/TIKA/tika/anniedev/src/main/resources/org/apache/tika/mime/custom-mimetypes.xml



--
This message was sent by Atlassian JIRA
(v6.2#6252)