You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2018/04/24 04:20:00 UTC
[jira] [Created] (TIKA-2636) ENVI Header metadata fields can span
more than one line
Lewis John McGibbney created TIKA-2636:
------------------------------------------
Summary: ENVI Header metadata fields can span more than one line
Key: TIKA-2636
URL: https://issues.apache.org/jira/browse/TIKA-2636
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.17
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Fix For: 2.0.0
Attachments: ang20150420t182050_corr_v1e_img.hdr
[~tpalsulich] was correct when [he stated|https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046140#comment-14046140] "...See below for how to read and output line by line (copy & paste between the xml start/end in EnviHeaderParser). I have a hunch this isn't really what we want -- what if a metadata field has a newline in it? What if the line is too long to fit into a string? On the other hand, with nice input, it's much nicer output."
As it turns out ENVI header metadata fields can span more than one line. An example is as follows
{code}
1. ENVI
2. description = {
3. Georeferenced Image built from input GLT. [Wed Jun 10 04:37:54 2015] [Wed
4. Jun 10 04:48:52 2015]}
5. samples = 739
6. lines = 14674
7. bands = 432
8. header offset = 0
9. file type = ENVI Standard
10. data type = 4
11. interleave = bil
12. sensor type = Unknown
13. byte order = 0
14. map info = { UTM , 1.000 , 1.000 , 724522.127 , 4074620.759 , 1.1000000000e+00 , 1.1000000000e+00 , 12 , North , WGS-84 , units=Meters , rotation=75.00000000 }
15. wavelength units = Nanometers
...
{code}
The case here is when a metadata field value is contained within curly brackets. The examples above are clearly L2-L4 where the value is spread over three lines and L14 where the value is contained within the one line.
This requires a patch to fix the [EnviHeaderParser|https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)