You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/11/18 17:26:59 UTC

[jira] [Commented] (PDFBOX-3581) PDFTextStripper not working with multiple threads

    [ https://issues.apache.org/jira/browse/PDFBOX-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15677203#comment-15677203 ] 

Tim Allison commented on PDFBOX-3581:
-------------------------------------

Y, please share PDFExtractThread.

If you're going from directory to directory, consider using Apache Tika, which wraps PDFBox and allows you to run directory to directory multithreaded from the commandline, e.g. 10 file processors:

java -jar tika-app.jar -i <input_dir> -o <output_dir> -numConsumers 10

> PDFTextStripper not working with multiple threads
> -------------------------------------------------
>
>                 Key: PDFBOX-3581
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3581
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Text extraction
>    Affects Versions: 2.0.3
>         Environment: Ubuntu 15.1
>            Reporter: Dmitri Russu
>              Labels: multithreading
>
> Hi, I try to use pdfbox to extract text from a list of files, the problem is PDFTextStripper does not work on thread mode, when I try to use it in multythread nothing happens. it is a bug or limitation? 
> could you help me ?
>  thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org