You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Tyler Palsulich <tp...@gmail.com> on 2014/06/10 00:18:30 UTC
Review Request 22402: Tika OCR
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/
-----------------------------------------------------------
Review request for tika and Chris Mattmann.
Repository: tika
Description
-------
Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
Diffs
-----
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1601508
trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1601508
trunk/tika-server/src/test/java/org/apache/tika/server/TikaMimeTypesTest.java 1601508
Diff: https://reviews.apache.org/r/22402/diff/
Testing
-------
Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
Thanks,
Tyler Palsulich
Re: Review Request 22402: Tika OCR
Posted by Tyler Palsulich <tp...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/#review45159
-----------------------------------------------------------
Need to add a license to the top of the new files.
- Tyler Palsulich
On June 9, 2014, 10:18 p.m., Tyler Palsulich wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22402/
> -----------------------------------------------------------
>
> (Updated June 9, 2014, 10:18 p.m.)
>
>
> Review request for tika and Chris Mattmann.
>
>
> Repository: tika
>
>
> Description
> -------
>
> Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
>
>
> Diffs
> -----
>
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
> trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1601508
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1601508
> trunk/tika-server/src/test/java/org/apache/tika/server/TikaMimeTypesTest.java 1601508
>
> Diff: https://reviews.apache.org/r/22402/diff/
>
>
> Testing
> -------
>
> Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
>
>
> Thanks,
>
> Tyler Palsulich
>
>
Re: Review Request 22402: Tika OCR
Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/#review50138
-----------------------------------------------------------
Ship it!
Add license headers, otherwise looks good.
- Chris Mattmann
On June 9, 2014, 10:18 p.m., Tyler Palsulich wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22402/
> -----------------------------------------------------------
>
> (Updated June 9, 2014, 10:18 p.m.)
>
>
> Review request for tika and Chris Mattmann.
>
>
> Repository: tika
>
>
> Description
> -------
>
> Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
>
>
> Diffs
> -----
>
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
> trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1601508
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1601508
> trunk/tika-server/src/test/java/org/apache/tika/server/TikaMimeTypesTest.java 1601508
>
> Diff: https://reviews.apache.org/r/22402/diff/
>
>
> Testing
> -------
>
> Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
>
>
> Thanks,
>
> Tyler Palsulich
>
>
Re: Review Request 22402: Tika OCR
Posted by Chris Mattmann <ma...@apache.org>.
> On Sept. 19, 2014, 6:14 a.m., Chris Mattmann wrote:
> > Ship It!
Ready to go @tpalsulich.Tested on my machine looks great! THanks everyone!
- Chris
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/#review53940
-----------------------------------------------------------
On Sept. 18, 2014, 10:07 p.m., Tyler Palsulich wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22402/
> -----------------------------------------------------------
>
> (Updated Sept. 18, 2014, 10:07 p.m.)
>
>
> Review request for tika and Chris Mattmann.
>
>
> Bugs: TIKA-93
> https://issues.apache.org/jira/browse/TIKA-93
>
>
> Repository: tika
>
>
> Description
> -------
>
> Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
>
>
> Diffs
> -----
>
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1624766
>
> Diff: https://reviews.apache.org/r/22402/diff/
>
>
> Testing
> -------
>
> Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
>
>
> Thanks,
>
> Tyler Palsulich
>
>
Re: Review Request 22402: Tika OCR
Posted by Chris Mattmann <ma...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/#review53940
-----------------------------------------------------------
Ship it!
Ship It!
- Chris Mattmann
On Sept. 18, 2014, 10:07 p.m., Tyler Palsulich wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22402/
> -----------------------------------------------------------
>
> (Updated Sept. 18, 2014, 10:07 p.m.)
>
>
> Review request for tika and Chris Mattmann.
>
>
> Bugs: TIKA-93
> https://issues.apache.org/jira/browse/TIKA-93
>
>
> Repository: tika
>
>
> Description
> -------
>
> Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
>
>
> Diffs
> -----
>
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
> trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
> trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1624766
>
> Diff: https://reviews.apache.org/r/22402/diff/
>
>
> Testing
> -------
>
> Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
>
>
> Thanks,
>
> Tyler Palsulich
>
>
Re: Review Request 22402: Tika OCR
Posted by Tyler Palsulich <tp...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/
-----------------------------------------------------------
(Updated Sept. 18, 2014, 10:07 p.m.)
Review request for tika and Chris Mattmann.
Changes
-------
Updated the patch to use JUnit Assume to ignore the tests if Tesseract is not installed and cleaned up some of the Exception throwing.
Bugs: TIKA-93
https://issues.apache.org/jira/browse/TIKA-93
Repository: tika
Description
-------
Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
Diffs (updated)
-----
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1624766
Diff: https://reviews.apache.org/r/22402/diff/
Testing
-------
Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
Thanks,
Tyler Palsulich
Re: Review Request 22402: Tika OCR
Posted by Tyler Palsulich <tp...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/
-----------------------------------------------------------
(Updated Sept. 15, 2014, 10:23 p.m.)
Review request for tika and Chris Mattmann.
Changes
-------
Passes all tests whether Tesseract is installed or not.
Bugs: TIKA-93
https://issues.apache.org/jira/browse/TIKA-93
Repository: tika
Description
-------
Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
Diffs (updated)
-----
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1624766
Diff: https://reviews.apache.org/r/22402/diff/
Testing
-------
Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
Thanks,
Tyler Palsulich
Re: Review Request 22402: Tika OCR
Posted by Tyler Palsulich <tp...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22402/
-----------------------------------------------------------
(Updated Aug. 11, 2014, 4:26 a.m.)
Review request for tika and Chris Mattmann.
Bugs: TIKA-93
https://issues.apache.org/jira/browse/TIKA-93
Repository: tika
Description
-------
Integrating Tesseract OCR with Tika through a new Parser. See TIKA-93.
Diffs
-----
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java PRE-CREATION
trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java PRE-CREATION
trunk/tika-parsers/src/main/resources/META-INF/services/org.apache.tika.parser.Parser 1601508
trunk/tika-parsers/src/test/java/org/apache/tika/parser/ocr/TesseractOCRTest.java PRE-CREATION
trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java 1601508
trunk/tika-server/src/test/java/org/apache/tika/server/TikaMimeTypesTest.java 1601508
Diff: https://reviews.apache.org/r/22402/diff/
Testing
-------
Extracting the text from an embedded image in a DOCX, PPTX, and PDF.
Thanks,
Tyler Palsulich