You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Clemens Wyss (JIRA)" <ji...@apache.org> on 2013/12/23 16:10:58 UTC
[jira] [Updated] (PDFBOX-1821) Parsing (extracting content) a
single 5Mb pdf file takes 3minutes
[ https://issues.apache.org/jira/browse/PDFBOX-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Clemens Wyss updated PDFBOX-1821:
---------------------------------
Attachment: takes3mins.pdf
the "bad" pdf ;)
> Parsing (extracting content) a single 5Mb pdf file takes 3minutes
> -----------------------------------------------------------------
>
> Key: PDFBOX-1821
> URL: https://issues.apache.org/jira/browse/PDFBOX-1821
> Project: PDFBox
> Issue Type: Bug
> Environment: Win7 (8G memory)
> Java 6
> Reporter: Clemens Wyss
> Attachments: takes3mins.pdf
>
>
> When I try to extract the attached pdf-file with the following code:
> ...
> PDFTextStripper stripper = new PDFTextStripper();
> OutputStream os = null;
> Writer writer = null;
> PDDocument document = null;
> File file = new File( "takes3mins.pdf" );
> ...
> document = PDDocument.load(file );
>
> File outFile = new File("c:/tmp/gugus.txt");
> os = new FileOutputStream(outFile);
> writer = new OutputStreamWriter(os);
>
> stripper.writeText(document, writer);
> ...
> it takes approx 3minutes. Opening it in AcrobatReader in a few seconds.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)