You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Edson Alves Pereira <lo...@gmail.com> on 2012/01/17 14:47:45 UTC

Adding support to listener while parsing pdf to text - PDFTextStripper.java

To help users get more interaction over the parsing processes.

Re: Adding support to listener while parsing pdf to text - PDFTextStripper.java

Posted by Edson Alves Pereira <lo...@gmail.com>.
I created the issue 1210 for that feature.

On Tue, Jan 17, 2012 at 3:04 PM, Edson Alves Pereira <lo...@gmail.com>wrote:

> right!
>
>
> On Tue, Jan 17, 2012 at 1:22 PM, Martinez, Mel - 1004 - MITLL <
> m.martinez@ll.mit.edu> wrote:
>
>> Three things:****
>>
>> ** **
>>
>> **1)      **Great Idea!****
>>
>> **2)      **‘Probably should be proposed through a Jira ‘new feature’
>> issue.   You can attach the files to the issue.
>> https://issues.apache.org/jira/browse/PDFBOX ****
>>
>> **3)      **I would recommend it be proposed and implemented as a
>> subclass of PDFTextStripper (i.e., something like
>> “ObservablePDFTextStripper” instead of wired into that class directly.
>> Almost all of PDFTextStripper can be overridden in a subclass so you should
>> be able to fully instrument this in a subclass.   There is a tiny but real
>> performance hit for broadcasting events (even if you have no listeners) and
>> it would be my preference that we do not introduce that overhead into the
>> main PDFTextStripper class.   Our group uses PDFTextStripper to process a
>> large amount of documents so performance is important to us.****
>>
>> ** **
>>
>> Cheers,****
>>
>> ** **
>>
>> Mel****
>>
>> ** **
>>
>> *From:* Edson Alves Pereira [mailto:lottalava@gmail.com]
>> *Sent:* Tuesday, January 17, 2012 8:48 AM
>> *To:* dev@pdfbox.apache.org
>> *Cc:* Raul Abreu Leite
>> *Subject:* Adding support to listener while parsing pdf to text -
>> PDFTextStripper.java****
>>
>> ** **
>>
>> To help users get more interaction over the parsing processes.****
>>
>
>

Re: Adding support to listener while parsing pdf to text - PDFTextStripper.java

Posted by Edson Alves Pereira <lo...@gmail.com>.
right!

On Tue, Jan 17, 2012 at 1:22 PM, Martinez, Mel - 1004 - MITLL <
m.martinez@ll.mit.edu> wrote:

> Three things:****
>
> ** **
>
> **1)      **Great Idea!****
>
> **2)      **‘Probably should be proposed through a Jira ‘new feature’
> issue.   You can attach the files to the issue.
> https://issues.apache.org/jira/browse/PDFBOX ****
>
> **3)      **I would recommend it be proposed and implemented as a
> subclass of PDFTextStripper (i.e., something like
> “ObservablePDFTextStripper” instead of wired into that class directly.
> Almost all of PDFTextStripper can be overridden in a subclass so you should
> be able to fully instrument this in a subclass.   There is a tiny but real
> performance hit for broadcasting events (even if you have no listeners) and
> it would be my preference that we do not introduce that overhead into the
> main PDFTextStripper class.   Our group uses PDFTextStripper to process a
> large amount of documents so performance is important to us.****
>
> ** **
>
> Cheers,****
>
> ** **
>
> Mel****
>
> ** **
>
> *From:* Edson Alves Pereira [mailto:lottalava@gmail.com]
> *Sent:* Tuesday, January 17, 2012 8:48 AM
> *To:* dev@pdfbox.apache.org
> *Cc:* Raul Abreu Leite
> *Subject:* Adding support to listener while parsing pdf to text -
> PDFTextStripper.java****
>
> ** **
>
> To help users get more interaction over the parsing processes.****
>

RE: Adding support to listener while parsing pdf to text - PDFTextStripper.java

Posted by "Martinez, Mel - 1004 - MITLL" <m....@ll.mit.edu>.
Three things:

 

1)      Great Idea!

2)      'Probably should be proposed through a Jira 'new feature' issue.
You can attach the files to the issue.   
https://issues.apache.org/jira/browse/PDFBOX 

3)      I would recommend it be proposed and implemented as a subclass of
PDFTextStripper (i.e., something like "ObservablePDFTextStripper" instead of
wired into that class directly.   Almost all of PDFTextStripper can be
overridden in a subclass so you should be able to fully instrument this in a
subclass.   There is a tiny but real performance hit for broadcasting events
(even if you have no listeners) and it would be my preference that we do not
introduce that overhead into the main PDFTextStripper class.   Our group
uses PDFTextStripper to process a large amount of documents so performance
is important to us.

 

Cheers,

 

Mel

 

From: Edson Alves Pereira [mailto:lottalava@gmail.com] 
Sent: Tuesday, January 17, 2012 8:48 AM
To: dev@pdfbox.apache.org
Cc: Raul Abreu Leite
Subject: Adding support to listener while parsing pdf to text -
PDFTextStripper.java

 

To help users get more interaction over the parsing processes.