You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/04/05 15:59:00 UTC
[jira] [Created] (TIKA-3346) Parsers should only appear once in the
"parsed by" metadata value
Tim Allison created TIKA-3346:
---------------------------------
Summary: Parsers should only appear once in the "parsed by" metadata value
Key: TIKA-3346
URL: https://issues.apache.org/jira/browse/TIKA-3346
Project: Tika
Issue Type: Task
Reporter: Tim Allison
[~peterkronenberg] noted on the user list that with the reworking of the integration of the ocr parser, the default parser and the TesseractOCRParser are entered for every page in a PDF. This symptom only happens with "inline" ocr'ing. We should limit adding new parsers to the "X-TIKA-ParsedBy" to a unique list to avoid duplication.
If anyone has a better option, let me know. I was thinking about sending in a dummy metadata object but that got messy quickly...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)