You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/10/11 03:09:33 UTC

[jira] [Updated] (PDFBOX-1429) Add color information to TextPosition

     [ https://issues.apache.org/jira/browse/PDFBOX-1429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Hewson updated PDFBOX-1429:
--------------------------------
    Affects Version/s: 2.0.0

> Add color information to TextPosition
> -------------------------------------
>
>                 Key: PDFBOX-1429
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1429
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Text extraction
>    Affects Versions: 1.7.1, 2.0.0
>            Reporter: PanQuanyi
>            Priority: Minor
>             Fix For: 1.8.8, 2.0.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> the Class org.apache.pdfbox.util.TextPosition offer just offer position of text in a page and limited Font info , (many chinese character not having FontDescriptor, so fontName and other style can not be retrieved. )
>  I think many people use PDFBox to build a client util to extract text and image,
>  and then reorginize the text and image to form a new article or book which will be read on ipad or mobile phone with the help of manual work to solve the layout , 
> but many book which have complex laout and color has so many page make this work need much human effort, if more work can be done automatically, it can be  efficient.
> so ,if a Class named Text with precise position ,fontSize ,font style and color and other such as background color can easily getted. 
> the process of Text extraction  also including exclude unnessary text, make text more colorful , can be easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)