You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Ann Burgess <an...@gmail.com> on 2014/06/23 23:43:17 UTC

Review Request 22892: New parser for ENVI header files

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/
-----------------------------------------------------------

Review request for tika.


Bugs: TIKA-1274
    https://issues.apache.org/jira/browse/TIKA-1274


Repository: tika


Description
-------

New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 


Diffs
-----

  trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 

Diff: https://reviews.apache.org/r/22892/diff/


Testing
-------

Text parsing test completed with file envi_test_header.hdr. 


Thanks,

Ann Burgess


Re: Review Request 22892: New parser for ENVI header files

Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/#review46457
-----------------------------------------------------------



trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
<https://reviews.apache.org/r/22892/#comment81848>

    org.apache.tika.parser.envi



trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java
<https://reviews.apache.org/r/22892/#comment81849>

    org.apache.tika.parser.envi


- Chris Mattmann


On June 23, 2014, 9:43 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22892/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 9:43 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: TIKA-1274
>     https://issues.apache.org/jira/browse/TIKA-1274
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 
> 
> 
> Diffs
> -----
> 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22892/diff/
> 
> 
> Testing
> -------
> 
> Text parsing test completed with file envi_test_header.hdr. 
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>


Re: Review Request 22892: New parser for ENVI header files

Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/#review46628
-----------------------------------------------------------

Ship it!


Ship It!

- Chris Mattmann


On June 23, 2014, 11:14 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22892/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 11:14 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: TIKA-1274
>     https://issues.apache.org/jira/browse/TIKA-1274
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 
> 
> 
> Diffs
> -----
> 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22892/diff/
> 
> 
> Testing
> -------
> 
> Text parsing test completed with file envi_test_header.hdr. 
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>


Re: Review Request 22892: New parser for ENVI header files

Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/#review46631
-----------------------------------------------------------



trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
<https://reviews.apache.org/r/22892/#comment82136>

    Good comment Nick. I committed the version of this patch without this improvement, and we can make this improvement later on with a new issue.


- Chris Mattmann


On June 23, 2014, 11:14 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22892/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 11:14 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: TIKA-1274
>     https://issues.apache.org/jira/browse/TIKA-1274
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 
> 
> 
> Diffs
> -----
> 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22892/diff/
> 
> 
> Testing
> -------
> 
> Text parsing test completed with file envi_test_header.hdr. 
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>


Re: Review Request 22892: New parser for ENVI header files

Posted by Ann Burgess <an...@gmail.com>.

> On June 24, 2014, 12:28 p.m., Nick Burch wrote:
> > trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java, lines 75-82
> > <https://reviews.apache.org/r/22892/diff/3/?file=615266#file615266line75>
> >
> >     This might be better using something like a BufferedReader, so you can read in one line of the Envi file at a time, and output each into their own p tag / li tag within a ul

Thanks for the input Nick.  I'm working on trying to implement the BufferedReader now. 


- Ann


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/#review46518
-----------------------------------------------------------


On June 23, 2014, 11:14 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22892/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 11:14 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: TIKA-1274
>     https://issues.apache.org/jira/browse/TIKA-1274
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 
> 
> 
> Diffs
> -----
> 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22892/diff/
> 
> 
> Testing
> -------
> 
> Text parsing test completed with file envi_test_header.hdr. 
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>


Re: Review Request 22892: New parser for ENVI header files

Posted by Tyler Palsulich <tp...@gmail.com>.

> On June 24, 2014, 12:28 p.m., Nick Burch wrote:
> >

Looking into this more, AutoDetectReader is already a subclass of BufferedReader. Should we, as discussed here [1], be reading chunk by chunk, as this code (and TXTParser) is doing manually? If so, we should really just use the built in BufferedReader implementation. Which... leads to AutoDetectReader -- we should create a new constructor which accepts a buffer size and passes that along in the super constructor call. Once we create that, we can clean this up to properly read chunk by chunk. Or, we just don't do that and read line by line, with reader.readLine(), as in the original StackOverflow question ;).

[1] - http://stackoverflow.com/questions/17084657/most-robust-way-of-reading-a-file-or-stream-using-java-to-prevent-dos-attacks


- Tyler


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/#review46518
-----------------------------------------------------------


On June 23, 2014, 11:14 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22892/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 11:14 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: TIKA-1274
>     https://issues.apache.org/jira/browse/TIKA-1274
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 
> 
> 
> Diffs
> -----
> 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22892/diff/
> 
> 
> Testing
> -------
> 
> Text parsing test completed with file envi_test_header.hdr. 
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>


Re: Review Request 22892: New parser for ENVI header files

Posted by Nick Burch <ni...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/#review46518
-----------------------------------------------------------



trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java
<https://reviews.apache.org/r/22892/#comment81964>

    This might be better using something like a BufferedReader, so you can read in one line of the Envi file at a time, and output each into their own p tag / li tag within a ul


- Nick Burch


On June 23, 2014, 11:14 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22892/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 11:14 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: TIKA-1274
>     https://issues.apache.org/jira/browse/TIKA-1274
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 
> 
> 
> Diffs
> -----
> 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22892/diff/
> 
> 
> Testing
> -------
> 
> Text parsing test completed with file envi_test_header.hdr. 
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>


Re: Review Request 22892: New parser for ENVI header files

Posted by Ann Burgess <an...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/
-----------------------------------------------------------

(Updated June 23, 2014, 11:14 p.m.)


Review request for tika.


Bugs: TIKA-1274
    https://issues.apache.org/jira/browse/TIKA-1274


Repository: tika


Description
-------

New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 


Diffs (updated)
-----

  trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 

Diff: https://reviews.apache.org/r/22892/diff/


Testing
-------

Text parsing test completed with file envi_test_header.hdr. 


Thanks,

Ann Burgess


Re: Review Request 22892: New parser for ENVI header files

Posted by Ann Burgess <an...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/
-----------------------------------------------------------

(Updated June 23, 2014, 10:01 p.m.)


Review request for tika.


Bugs: TIKA-1274
    https://issues.apache.org/jira/browse/TIKA-1274


Repository: tika


Description
-------

New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 


Diffs (updated)
-----

  trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
  trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
  trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 

Diff: https://reviews.apache.org/r/22892/diff/


Testing
-------

Text parsing test completed with file envi_test_header.hdr. 


Thanks,

Ann Burgess


Re: Review Request 22892: New parser for ENVI header files

Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22892/#review46459
-----------------------------------------------------------


Looks great Annie, with the package updates I think I can commit this.

- Chris Mattmann


On June 23, 2014, 9:43 p.m., Ann Burgess wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22892/
> -----------------------------------------------------------
> 
> (Updated June 23, 2014, 9:43 p.m.)
> 
> 
> Review request for tika.
> 
> 
> Bugs: TIKA-1274
>     https://issues.apache.org/jira/browse/TIKA-1274
> 
> 
> Repository: tika
> 
> 
> Description
> -------
> 
> New parser for ENVI header files.  Note, this is a parser for header files that will have an associated, separate data file.  This parser will not extract content from the data file. 
> 
> 
> Diffs
> -----
> 
>   trunk/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java PRE-CREATION 
>   trunk/tika-parsers/src/test/java/org/apache/tika/parser/envi/EnviHeaderParserTest.java PRE-CREATION 
>   trunk/tika-parsers/src/test/resources/test-documents/envi_test_header.hdr PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22892/diff/
> 
> 
> Testing
> -------
> 
> Text parsing test completed with file envi_test_header.hdr. 
> 
> 
> Thanks,
> 
> Ann Burgess
> 
>