You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/02/13 03:11:00 UTC

[jira] [Closed] (PDFBOX-4769) Problem pdf version 1.4

     [ https://issues.apache.org/jira/browse/PDFBOX-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr closed PDFBOX-4769.
-----------------------------------
    Resolution: Not A Bug

> Problem pdf version 1.4
> -----------------------
>
>                 Key: PDFBOX-4769
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4769
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.17
>         Environment: java, maven, 
>            Reporter: NathanJ
>            Priority: Blocker
>
> Here is my problem. I have to read pdf files and i decided to use pdfbox. I'm using the following code to read my file line by line to execute some actions on each ones :
> File tempFile = "_myPdfFile"_
> {color:#cc7832}try {color}(PDDocument document = PDDocument.load(tempFile)) 
> {{color:#cc7832}
> {color}{color:#cc7832}
> {color}{color:#cc7832} if {color}(!document.isEncrypted())
>  {
>  PDFTextStripperByArea stripper = {color:#cc7832}new {color}PDFTextStripperByArea(){color:#cc7832};
> {color} stripper.setSortByPosition({color:#cc7832}true{color}){color:#cc7832};
> {color} PDFTextStripper tStripper = {color:#cc7832}new {color}PDFTextStripper(){color:#cc7832};
> {color} String pdfFileInText = tStripper.getText(document){color:#cc7832};
> {color} String lines[] = pdfFileInText.split({color:#6a8759}"{color}{color:#cc7832}\\{color}{color:#6a8759}r?{color}{color:#cc7832}\\{color}{color:#6a8759}n"{color}){color:#cc7832};{color}
> For a pdf in format version 1.7, all is working well. But sometimes, i have to work with pdf version 1.4 and at this moment there is a problem : the PDFTextStripper is unable to read the pdf and my "pdfFileInText" get this value : "\r\n\r\n" and that's all. 
>  
> I didn't find any solutions on the web.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org