You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@pdfbox.apache.org by "psconceicao@outlook.com" <ps...@outlook.com> on 2016/12/27 23:52:43 UTC

Identify not visible characters - Overlapped characters

Hello everyone,

I am using PDFBox 1.8.12 (because I'm developing in C#) and I can extract all characters from a PDF with the respective position.

My objective is to perform a layout analysis and try to reproduce the PDF layout in a text file.
However, I'm facing a huge problem: identify not visible characters.

In the annexed file, the text "Alandroal (Nossa Senhora da Conceic..." is using some space used by the word "Rural" (row 5), but not visible.

I would like to someone help me to get a way to identify the text not visible, in order to avoid those characters in the text file.

This approach: http://stackoverflow.com/questions/19809813/how-to-check-if-a-text-is-transparent-with-pdfbox doesn't work in the annexed file (only works with images).


Many thanks in advance,
Paulo Sergio

Re: Identify not visible characters - Overlapped characters

Posted by Tilman Hausherr <TH...@t-online.de>.

Please upload the PDF somewhere.

Tilman

Am 28.12.2016 um 00:52 schrieb psconceicao@outlook.com:
>
> Hello everyone,
>
> I am using PDFBox 1.8.12 (because Im developing in C#) and I can 
> extract all characters from a PDF with the respective position.
>
> My objective is to perform a layout analysis and try to reproduce the 
> PDF layout in a text file.
>
> However, Im facing a huge problem: identify not visible characters.
>
> In the annexed file, the text Alandroal (Nossa Senhora da Conceic 
> is using some space used by the word Rural (row 5), but not visible.
>
> I would like to someone help me to get a way to identify the text not 
> visible, in order to avoid those characters in the text file.
>
> This approach: 
> http://stackoverflow.com/questions/19809813/how-to-check-if-a-text-is-transparent-with-pdfbox 
> doesnt work in the annexed file (only works with images).
>
> Many thanks in advance,
>
> Paulo Sergio
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Identify not visible characters - Overlapped characters

Posted by Manuel Aristarán <ma...@jazzido.com>.

> On Dec 28, 2016, at 5:49 PM, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> Ah, you're extending PageDrawer. That of course gives you the clipping path on a silver plate :-)

Indeed. Most of Tabula's magic happens there. Also, we're lamenting that there's no direct equivalent to PageDrawer in PDFBox 2…adapting Tabula to the new version is getting a bit tricky :)

—
Manuel Aristarán <ma...@jazzido.com>
http://jazzido.com

Re: Identify not visible characters - Overlapped characters

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 28.12.2016 um 21:32 schrieb Manuel Aristar�n:
>> On Dec 28, 2016, at 8:18 AM, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>> [\u2026]
>> Try also https://github.com/tabulapdf/ <https://github.com/tabulapdf/> , I wonder how they handle this problem.
> Hi, main author of Tabula here.
>
> We've come across that case many times. Some spreadsheet->PDF generators clip a cell's content to the extent of its container. We handle it by simply detecting whether a character is inside the current clipping path [1].
>
> Cheers,
>
> [1] https://github.com/tabulapdf/tabula-java/blob/master/src/main/java/technology/tabula/ObjectExtractor.java#L342

Ah, you're extending PageDrawer. That of course gives you the clipping 
path on a silver plate :-)

Tilman

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Identify not visible characters - Overlapped characters

Posted by Manuel Aristarán <ma...@jazzido.com>.

> On Dec 28, 2016, at 8:18 AM, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> […]
> Try also https://github.com/tabulapdf/ <https://github.com/tabulapdf/> , I wonder how they handle this problem.

Hi, main author of Tabula here.

We've come across that case many times. Some spreadsheet->PDF generators clip a cell's content to the extent of its container. We handle it by simply detecting whether a character is inside the current clipping path [1].

Cheers,

[1] https://github.com/tabulapdf/tabula-java/blob/master/src/main/java/technology/tabula/ObjectExtractor.java#L342

—
Manuel Aristarán <ma...@jazzido.com>
http://jazzido.com

Re: Identify not visible characters - Overlapped characters

Posted by Peter Murray-Rust <pm...@cam.ac.uk>.

Greetings,
Over the years we have developed PDF2SVG (
https://bitbucket.org/petermr/pdf2svg/overview) which is built on PDFBOX
1.8 and uses PageDrawer to capture the primitives (characters, paths,
images). It tries to carry out a faithful extraction without semantic loss
and there is effort on capturing styles, font-weights, and translating
non-standard characters to Unicode. This is particularly important for high
Unicode points such as mathematical symbols. There is emphasis on
downstream analysis (e.g. converting paths to SVG primitives such as
circles and rects). The main downstream emphasis is on scientific and
technical documents and translating the complete contents (text, diagrams,
images, tables, etc.) to semantic form.

You are welcome to try it and see whether it helps your problem. The
clipping paths are initially preserved, I think, but not output to the
final SVG

We haven't needed to change it for a year or so, but have wondered about
converting to PDFBOX2.0. Unfortunately as Manuel (hi!!) says this requires
considerable rewriting (and I'd be interested in knowing if anyone has
written the equivalent of capturing the PageDrawer output.

We are now being asked to process some documents in bulk and can process a
subset of tables, but we don't intend to duplicate complete Tabula
functionality.

P.



On Thu, Dec 29, 2016 at 6:22 AM, John Logan <Jo...@texture.com> wrote:

> Hi Paulo,
>
>
> Is your layout analysis focused on extracting tabular data (records) from
> a PDF file?  Or are you trying to handle more general layouts?
>
>
> PDFBOX-2998 contains detailed discussion about enhancing the extraction
> algorithms, including adding advanced layout analysis.  The argument
> against this is that it's very hard to simultaneously achieve high quality
> and general applicability.
>
>
> The current text extractor allows a developer to override the text output
> methods, but the core is fairly monolithic.  It'd be nice to rework the
> text extraction so that the process was more modular, and so that alternate
> processes could be include components from externally-developed classes and
> libraries.  This way, PDFbox doesn't need to solve the general layout
> analysis problem, but it would be easier to develop extensions that solve
> specific problems well.
>
>
> For what it's worth, the way I currently approach it is to define a
> PdfTextFeatureExtractor that extends PDFStreamEngine.  In particular, the
> new class overrides the showGlyph() method to write a YAML file that
> contains detailed information for each rendered glyph.
>
>
> From there one can develop whatever one wants for layout extraction and
> all of the other segmentation and classification tasks.  The core layout
> analysis techniques I chose for my work are based on the paper "Two
> Geometric Algorithms for Layout Analysis", by Thomas Breuel.
>
>
> Best regards,
>
>
> John
>
> ________________________________
> From: psconceicao@outlook.com <ps...@outlook.com>
> Sent: Wednesday, December 28, 2016 1:29:54 PM
> To: users@pdfbox.apache.org
> Subject: RE: Identify not visible characters - Overlapped characters
>
> Hi Manuel,
>
> I'm sorry for my mistake and many thanks for your help and attention.
>
> The best tool that I know to extract text from a PDF ( I didn't test
> Monarch), maintaining the correct layout, is inside a CAAT software:
> Caseware IDEA. However this software is very expensive and does a lot of
> other things.
>
> All the others tools that I tested (and I tested several) do wrong
> positioning analysis.
>
> It will be good to develop a tool to produce similar results obtained with
> IDEA.
>
> The work that you developed can help others to achieve that result.
>
> Paulo
> -----Mensagem original-----
> De: Manuel Aristarán [mailto:jazzido@jazzido.com] Em nome de Manuel
> Aristarán
> Enviada: quarta-feira, 28 de dezembro de 2016 20:37
> Para: users@pdfbox.apache.org
> Assunto: Re: Identify not visible characters - Overlapped characters
>
> Hi Paulo,
>
> > On Dec 28, 2016, at 9:52 AM, psconceicao@outlook.com wrote:
> >
> > Unfortunately, Tabula uses a totally different approach (image
> > analysis) [...]
>
> Sorry for going (sort of) off-topic, but that's not correct. In fact,
> Tabula does not support images. Thanks to PDFBox, it "mines" text and
> graphical elements, and uses a set of heuristics that attempt reconstruct a
> tabular structure.
>
> > Tabula also do incoherent analysis when a table is larger than one
> > page, for that reason Tabula is far from being a good tool for text
> > extraction with correct positioning.
>
> We always welcome bug reports (and patches!) :) [1]
>
> Thanks!
>
> [1] https://github.com/tabulapdf/tabula-java/issues
>
>
> —
> Manuel Aristarán <ma...@jazzido.com>
> http://jazzido.com
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>


-- 
Peter Murray-Rust
Reader Emeritus in Molecular Informatics
Unilever Centre, Dept. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

Re: Identify not visible characters - Overlapped characters

Posted by John Logan <Jo...@texture.com>.

Hi Paulo,

Is your layout analysis focused on extracting tabular data (records) from a PDF file?  Or are you trying to handle more general layouts?

PDFBOX-2998 contains detailed discussion about enhancing the extraction algorithms, including adding advanced layout analysis.  The argument against this is that it's very hard to simultaneously achieve high quality and general applicability.

The current text extractor allows a developer to override the text output methods, but the core is fairly monolithic.  It'd be nice to rework the text extraction so that the process was more modular, and so that alternate processes could be include components from externally-developed classes and libraries.  This way, PDFbox doesn't need to solve the general layout analysis problem, but it would be easier to develop extensions that solve specific problems well.

For what it's worth, the way I currently approach it is to define a PdfTextFeatureExtractor that extends PDFStreamEngine.  In particular, the new class overrides the showGlyph() method to write a YAML file that contains detailed information for each rendered glyph.

From there one can develop whatever one wants for layout extraction and all of the other segmentation and classification tasks.  The core layout analysis techniques I chose for my work are based on the paper "Two Geometric Algorithms for Layout Analysis", by Thomas Breuel.

Best regards,

John

________________________________
From: psconceicao@outlook.com <ps...@outlook.com>
Sent: Wednesday, December 28, 2016 1:29:54 PM
To: users@pdfbox.apache.org
Subject: RE: Identify not visible characters - Overlapped characters

Hi Manuel,

I'm sorry for my mistake and many thanks for your help and attention.

The best tool that I know to extract text from a PDF ( I didn't test Monarch), maintaining the correct layout, is inside a CAAT software: Caseware IDEA. However this software is very expensive and does a lot of other things.

All the others tools that I tested (and I tested several) do wrong positioning analysis.

It will be good to develop a tool to produce similar results obtained with IDEA.

The work that you developed can help others to achieve that result.

Paulo
-----Mensagem original-----
De: Manuel Aristarán [mailto:jazzido@jazzido.com] Em nome de Manuel Aristarán
Enviada: quarta-feira, 28 de dezembro de 2016 20:37
Para: users@pdfbox.apache.org
Assunto: Re: Identify not visible characters - Overlapped characters

Hi Paulo,

> On Dec 28, 2016, at 9:52 AM, psconceicao@outlook.com wrote:
>
> Unfortunately, Tabula uses a totally different approach (image
> analysis) [...]

Sorry for going (sort of) off-topic, but that's not correct. In fact, Tabula does not support images. Thanks to PDFBox, it "mines" text and graphical elements, and uses a set of heuristics that attempt reconstruct a tabular structure.

> Tabula also do incoherent analysis when a table is larger than one
> page, for that reason Tabula is far from being a good tool for text
> extraction with correct positioning.

We always welcome bug reports (and patches!) :) [1]

Thanks!

[1] https://github.com/tabulapdf/tabula-java/issues

—
Manuel Aristarán <ma...@jazzido.com>
http://jazzido.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Identify not visible characters - Overlapped characters

Posted by Manuel Aristarán <ma...@jazzido.com>.

> On Dec 28, 2016, at 6:29 PM, psconceicao@outlook.com wrote:
> 
> […] All the others tools that I tested (and I tested several) do wrong positioning analysis. […]

So you want to do PDF -> TXT conversion, and *preserve* the layout of the text?

There are two FOSS tools that perform quite well for that task:

  - Poppler's pdftotext (with the -layout option)
  - MuPDF's mudraw

Best,


—
Manuel Aristarán <ma...@jazzido.com>
http://jazzido.com



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

RE: Identify not visible characters - Overlapped characters

Posted by "psconceicao@outlook.com" <ps...@outlook.com>.

Hi Manuel,

I'm sorry for my mistake and many thanks for your help and attention.

The best tool that I know to extract text from a PDF ( I didn't test Monarch), maintaining the correct layout, is inside a CAAT software: Caseware IDEA. However this software is very expensive and does a lot of other things.

All the others tools that I tested (and I tested several) do wrong positioning analysis.

It will be good to develop a tool to produce similar results obtained with IDEA.

The work that you developed can help others to achieve that result.

Paulo
-----Mensagem original-----
De: Manuel Aristarán [mailto:jazzido@jazzido.com] Em nome de Manuel Aristarán
Enviada: quarta-feira, 28 de dezembro de 2016 20:37
Para: users@pdfbox.apache.org
Assunto: Re: Identify not visible characters - Overlapped characters

Hi Paulo,

> On Dec 28, 2016, at 9:52 AM, psconceicao@outlook.com wrote:
> 
> Unfortunately, Tabula uses a totally different approach (image 
> analysis) [...]

Sorry for going (sort of) off-topic, but that's not correct. In fact, Tabula does not support images. Thanks to PDFBox, it "mines" text and graphical elements, and uses a set of heuristics that attempt reconstruct a tabular structure.

> Tabula also do incoherent analysis when a table is larger than one 
> page, for that reason Tabula is far from being a good tool for text 
> extraction with correct positioning.

We always welcome bug reports (and patches!) :) [1]

Thanks!

[1] https://github.com/tabulapdf/tabula-java/issues

—
Manuel Aristarán <ma...@jazzido.com>
http://jazzido.com

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Identify not visible characters - Overlapped characters

Posted by Manuel Aristarán <ma...@jazzido.com>.

Hi Paulo,

> On Dec 28, 2016, at 9:52 AM, psconceicao@outlook.com wrote:
> 
> Unfortunately, Tabula uses a totally different approach (image analysis)
> [...]

Sorry for going (sort of) off-topic, but that's not correct. In fact, Tabula does not support images. Thanks to PDFBox, it "mines" text and graphical elements, and uses a set of heuristics that attempt reconstruct a tabular structure.

> Tabula also do incoherent analysis when a table is larger than one page, for
> that reason Tabula is far from being a good tool for text extraction with
> correct positioning.

We always welcome bug reports (and patches!) :) [1]

Thanks!

[1] https://github.com/tabulapdf/tabula-java/issues


—
Manuel Aristarán <ma...@jazzido.com>
http://jazzido.com




---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Identify not visible characters - Overlapped characters

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 28.12.2016 um 13:52 schrieb psconceicao@outlook.com:
> Thank you.
>
> Unfortunately, Tabula uses a totally different approach (image analysis)
> that works well for the exposed problem, but in others do a lot of errors.
> Tabula also do incoherent analysis when a table is larger than one page, for
> that reason Tabula is far from being a good tool for text extraction with
> correct positioning.
>
> I'm going to study the clipping operators, and try to solve this kind of
> issues.
>
> The type of file shown is very common in result of printing spreadsheets to
> PDF, especially in cases where the content of one cell is larger than what
> is seen in the screen, because the next cell also has content. For that
> reason, I really have to solve this kind of problem.
>
> Tilman, do you know any peace of code to give a little help?

There is no code that does what you need. What I recommend is that you 
look at PageDrawer.java and search for "clip". Then use these in a text 
stripper. You need to make sure that some of the operators are called 
which are PageDrawer.properties, e.g. 
org.apache.pdfbox.util.operator.pagedrawer.ClipNonZeroRule (W operator) 
and several more, which must be added to PDFTextStripper.properties or 
to a new property file. (It is different in 2.0).

If you manage the clipping path correctly, the next step would be to 
identify the position of each glyph and then decide whether it is in the 
clipping path or not.

A sample code is PrintTextLocations, you can use this to see how things 
are called. To debug rendering, have a look at PDFToImage.

Tilman


>
> Many thanks!
>
> Paulo
>
> -----Mensagem original-----
> De: Tilman Hausherr [mailto:THausherr@t-online.de]
> Enviada: quarta-feira, 28 de dezembro de 2016 11:19
> Para: users@pdfbox.apache.org
> Assunto: Re: Identify not visible characters - Overlapped characters
>
> Am 28.12.2016 um 00:52 schrieb psconceicao@outlook.com:
>> Hello everyone,
>>
>> I am using PDFBox 1.8.12 (because I'm developing in C#) and I can
>> extract all characters from a PDF with the respective position.
>>
>> My objective is to perform a layout analysis and try to reproduce the
>> PDF layout in a text file.
>>
>> However, I'm facing a huge problem: identify not visible characters.
>>
>> In the annexed file, the text "Alandroal (Nossa Senhora da Conceic."
>> is using some space used by the word "Rural" (row 5), but not visible.
>>
> Ooohhhh... your file shows one interesting effect, which may or may not be a
> bug: text extraction shows more data than in rendering. "Alandroal (Nossa
> Senhora da Conceicao)" is extracted in full, but in rendering we only see
> "Alandroal (Nossa Senhora da Co" due to clipping.
>
> This may need a change deeply in PDFBox itself, i.e. check whether a glyph
> is in the clipping region or not. For that, you'd need to have a look at
> PageDrawer.java, and copy all clipping operations to the text stripper (or
> extend the text stripper). I'd rather recommend to do this with the 2.0
> version, to avoid a lot of work to move from 1.8 to 2.0 at a later time.
>
> Try also https://github.com/tabulapdf/ , I wonder how they handle this
> problem.
>
> Tilman
>
>> I would like to someone help me to get a way to identify the text not
>> visible, in order to avoid those characters in the text file.
>>
>> This approach:
>> http://stackoverflow.com/questions/19809813/how-to-check-if-a-text-is-
>> transparent-with-pdfbox doesn't work in the annexed file (only works
>> with images).
>>
>> Many thanks in advance,
>>
>> Paulo Sergio
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

RE: Identify not visible characters - Overlapped characters

Posted by "psconceicao@outlook.com" <ps...@outlook.com>.

Thank you.

Unfortunately, Tabula uses a totally different approach (image analysis)
that works well for the exposed problem, but in others do a lot of errors.
Tabula also do incoherent analysis when a table is larger than one page, for
that reason Tabula is far from being a good tool for text extraction with
correct positioning.

I'm going to study the clipping operators, and try to solve this kind of
issues.

The type of file shown is very common in result of printing spreadsheets to
PDF, especially in cases where the content of one cell is larger than what
is seen in the screen, because the next cell also has content. For that
reason, I really have to solve this kind of problem.

Tilman, do you know any peace of code to give a little help?

Many thanks!

Paulo

-----Mensagem original-----
De: Tilman Hausherr [mailto:THausherr@t-online.de] 
Enviada: quarta-feira, 28 de dezembro de 2016 11:19
Para: users@pdfbox.apache.org
Assunto: Re: Identify not visible characters - Overlapped characters

Am 28.12.2016 um 00:52 schrieb psconceicao@outlook.com:
>
> Hello everyone,
>
> I am using PDFBox 1.8.12 (because I'm developing in C#) and I can 
> extract all characters from a PDF with the respective position.
>
> My objective is to perform a layout analysis and try to reproduce the 
> PDF layout in a text file.
>
> However, I'm facing a huge problem: identify not visible characters.
>
> In the annexed file, the text "Alandroal (Nossa Senhora da Conceic." 
> is using some space used by the word "Rural" (row 5), but not visible.
>

Ooohhhh... your file shows one interesting effect, which may or may not be a
bug: text extraction shows more data than in rendering. "Alandroal (Nossa
Senhora da Conceicao)" is extracted in full, but in rendering we only see
"Alandroal (Nossa Senhora da Co" due to clipping.

This may need a change deeply in PDFBox itself, i.e. check whether a glyph
is in the clipping region or not. For that, you'd need to have a look at
PageDrawer.java, and copy all clipping operations to the text stripper (or
extend the text stripper). I'd rather recommend to do this with the 2.0
version, to avoid a lot of work to move from 1.8 to 2.0 at a later time.

Try also https://github.com/tabulapdf/ , I wonder how they handle this
problem.

Tilman

> I would like to someone help me to get a way to identify the text not 
> visible, in order to avoid those characters in the text file.
>
> This approach: 
> http://stackoverflow.com/questions/19809813/how-to-check-if-a-text-is-
> transparent-with-pdfbox doesn't work in the annexed file (only works 
> with images).
>
> Many thanks in advance,
>
> Paulo Sergio
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org

Re: Identify not visible characters - Overlapped characters

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 28.12.2016 um 00:52 schrieb psconceicao@outlook.com:
>
> Hello everyone,
>
> I am using PDFBox 1.8.12 (because Im developing in C#) and I can 
> extract all characters from a PDF with the respective position.
>
> My objective is to perform a layout analysis and try to reproduce the 
> PDF layout in a text file.
>
> However, Im facing a huge problem: identify not visible characters.
>
> In the annexed file, the text Alandroal (Nossa Senhora da Conceic 
> is using some space used by the word Rural (row 5), but not visible.
>

Ooohhhh... your file shows one interesting effect, which may or may not 
be a bug: text extraction shows more data than in rendering. "Alandroal 
(Nossa Senhora da Conceicao)" is extracted in full, but in rendering we 
only see "Alandroal (Nossa Senhora da Co" due to clipping.

This may need a change deeply in PDFBox itself, i.e. check whether a 
glyph is in the clipping region or not. For that, you'd need to have a 
look at PageDrawer.java, and copy all clipping operations to the text 
stripper (or extend the text stripper). I'd rather recommend to do this 
with the 2.0 version, to avoid a lot of work to move from 1.8 to 2.0 at 
a later time.

Try also https://github.com/tabulapdf/ , I wonder how they handle this 
problem.

Tilman

> I would like to someone help me to get a way to identify the text not 
> visible, in order to avoid those characters in the text file.
>
> This approach: 
> http://stackoverflow.com/questions/19809813/how-to-check-if-a-text-is-transparent-with-pdfbox 
> doesnt work in the annexed file (only works with images).
>
> Many thanks in advance,
>
> Paulo Sergio
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org