You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/04/05 15:59:00 UTC

[jira] [Created] (TIKA-3346) Parsers should only appear once in the "parsed by" metadata value

Tim Allison created TIKA-3346:
---------------------------------

             Summary: Parsers should only appear once in the "parsed by" metadata value
                 Key: TIKA-3346
                 URL: https://issues.apache.org/jira/browse/TIKA-3346
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


[~peterkronenberg] noted on the user list that with the reworking of the integration of the ocr parser, the default parser and the TesseractOCRParser are entered for every page in a PDF.  This symptom only happens with "inline" ocr'ing.  We should limit adding new parsers to the "X-TIKA-ParsedBy" to a unique list to avoid duplication.

If anyone has a better option, let me know.  I was thinking about sending in a dummy metadata object but that got messy quickly...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)