You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/04/24 04:20:00 UTC

[jira] [Created] (TIKA-2636) ENVI Header metadata fields can span more than one line

Lewis John McGibbney created TIKA-2636:
------------------------------------------

             Summary: ENVI Header metadata fields can span more than one line
                 Key: TIKA-2636
                 URL: https://issues.apache.org/jira/browse/TIKA-2636
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.17
            Reporter: Lewis John McGibbney
            Assignee: Lewis John McGibbney
             Fix For: 2.0.0
         Attachments: ang20150420t182050_corr_v1e_img.hdr

[~tpalsulich] was correct when [he stated|https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046140#comment-14046140] "...See below for how to read and output line by line (copy & paste between the xml start/end in EnviHeaderParser). I have a hunch this isn't really what we want -- what if a metadata field has a newline in it? What if the line is too long to fit into a string? On the other hand, with nice input, it's much nicer output."

As it turns out ENVI header metadata fields can span more than one line. An example is as follows

{code}
1.    ENVI
2.    description = {
3.      Georeferenced Image built from input GLT. [Wed Jun 10 04:37:54 2015] [Wed
4.      Jun 10 04:48:52 2015]}
5.    samples = 739
6.    lines = 14674
7.    bands = 432
8.    header offset = 0
9.    file type = ENVI Standard
10.    data type = 4
11.    interleave = bil
12.    sensor type = Unknown
13.    byte order = 0
14.    map info = { UTM , 1.000 , 1.000 , 724522.127 , 4074620.759 , 1.1000000000e+00 , 1.1000000000e+00 , 12 , North , WGS-84 , units=Meters , rotation=75.00000000 }
15.    wavelength units = Nanometers
...
{code}

The case here is when a metadata field value is contained within curly brackets. The examples above are clearly L2-L4 where the value is spread over three lines and L14 where the value is contained within the one line.

This requires a patch to fix the [EnviHeaderParser|https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)