You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Saurabh Patil (JIRA)" <ji...@apache.org> on 2018/05/24 14:36:00 UTC

[jira] [Created] (TIKA-2650) Soft-hyphen is not extracted properly

Saurabh Patil created TIKA-2650:
-----------------------------------

             Summary: Soft-hyphen is not extracted properly
                 Key: TIKA-2650
                 URL: https://issues.apache.org/jira/browse/TIKA-2650
             Project: Tika
          Issue Type: Bug
          Components: app
    Affects Versions: 1.18
            Reporter: Saurabh Patil
         Attachments: Peter Rabbit.pdf

We are tring to extract text from PDF. if PDF having any big word at the end of line then after half word there is soft hyphen and remaining word goes to next line. but which extracting these text TIKA automatically replace hyphen with space.  

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)