You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Swati Agarwal (JIRA)" <ji...@apache.org> on 2009/05/16 22:14:45 UTC

[jira] Updated: (PDFBOX-27) extract the text of certain page at certain line

     [ https://issues.apache.org/jira/browse/PDFBOX-27?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Swati Agarwal updated PDFBOX-27:
--------------------------------

    Attachment: TextStripper.patch

Hi 

I have added the functionality for extracting text from a certain page at certain line of a PDF file.
I have added a new method "getTextTillLine" to 'org.pdfbox.util.PDFTextStripper.java' which takes two parameters - A 'PDDocument' object and a 'String' which is specified by the user. 
The function parses the Pdf and gets the text from the PDF until the String appears in the document. 
Also it searches for the string in the page numbers specified. If not then the whole document is searched.

Thanks
Swati

> extract the text of certain page at certain line
> ------------------------------------------------
>
>                 Key: PDFBOX-27
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-27
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Text extraction
>         Attachments: TextStripper.patch
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=1050558
> Originally submitted by nobody on 2004-10-20 01:44.
> sometimes,it is unneeded to extraction all the text in a 
> pdf file :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.