You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/05/01 17:38:00 UTC

[jira] [Updated] (TIKA-2636) ENVI Header metadata fields can span more than one line

     [ https://issues.apache.org/jira/browse/TIKA-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney updated TIKA-2636:
---------------------------------------
    Fix Version/s:     (was: 2.0.0)
                   1.19

> ENVI Header metadata fields can span more than one line
> -------------------------------------------------------
>
>                 Key: TIKA-2636
>                 URL: https://issues.apache.org/jira/browse/TIKA-2636
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.18
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 1.19
>
>         Attachments: ang20150420t182050_corr_v1e_img.hdr
>
>
> [~tpalsulich] was correct when [he stated|https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046140#comment-14046140] "...See below for how to read and output line by line (copy & paste between the xml start/end in EnviHeaderParser). I have a hunch this isn't really what we want -- what if a metadata field has a newline in it? What if the line is too long to fit into a string? On the other hand, with nice input, it's much nicer output."
> As it turns out ENVI header metadata fields can span more than one line. An example is as follows
> {code}
> 1.    ENVI
> 2.    description = {
> 3.      Georeferenced Image built from input GLT. [Wed Jun 10 04:37:54 2015] [Wed
> 4.      Jun 10 04:48:52 2015]}
> 5.    samples = 739
> 6.    lines = 14674
> 7.    bands = 432
> 8.    header offset = 0
> 9.    file type = ENVI Standard
> 10.    data type = 4
> 11.    interleave = bil
> 12.    sensor type = Unknown
> 13.    byte order = 0
> 14.    map info = { UTM , 1.000 , 1.000 , 724522.127 , 4074620.759 , 1.1000000000e+00 , 1.1000000000e+00 , 12 , North , WGS-84 , units=Meters , rotation=75.00000000 }
> 15.    wavelength units = Nanometers
> ...
> {code}
> The case here is when a metadata field value is contained within curly brackets. The examples above are clearly L2-L4 where the value is spread over three lines and L14 where the value is contained within the one line.
> This requires a patch to fix the [EnviHeaderParser|https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)