You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Saurabh Patil (JIRA)" <ji...@apache.org> on 2018/05/24 14:36:00 UTC
[jira] [Created] (TIKA-2650) Soft-hyphen is not extracted properly
Saurabh Patil created TIKA-2650:
-----------------------------------
Summary: Soft-hyphen is not extracted properly
Key: TIKA-2650
URL: https://issues.apache.org/jira/browse/TIKA-2650
Project: Tika
Issue Type: Bug
Components: app
Affects Versions: 1.18
Reporter: Saurabh Patil
Attachments: Peter Rabbit.pdf
We are tring to extract text from PDF. if PDF having any big word at the end of line then after half word there is soft hyphen and remaining word goes to next line. but which extracting these text TIKA automatically replace hyphen with space.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)