You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Duane Nickull <du...@technoracle-systems.com> on 2012/12/01 05:30:01 UTC

Re: Spaces (char 32) in output of PDFBox

You mean you just posted here without reading through the 1700 page
specification, 23 auxiliary PDF specifications and over 225,000 pages of
reference materials just looking for a quick answer?

;-)

AFAIK, I am pretty sure dec 32 was supported in PostScript back as early
as 1996.  I believe that made it into PDF but it is not considered a
glyph, rather a char.  My Adobe days memory is fading rapidly but I do
believe it is a legal character.  Below you describe a question.  Is there
a context to it (i.e. - a decision required to be made)?  I might be able
to help if there is a specific reason this is required.  Feel free to ping
me off list if it is more appropriate.

Cheers!

Duane Nickull
***********************************
Technoracle Advanced Systems Inc.
Consulting and Contracting; Proven Results!
i.  Neo4J, PDF, Java, LiveCycle ES, Flex, AIR, CQ5 & Mobile
b. http://technoracle.blogspot.com
t.  @duanechaos
"Don't fear the Graph!  Embrace Neo4J"






On 2012-11-29 3:09 PM, "Peter Murray-Rust" <pm...@cam.ac.uk> wrote:

>I am analysing running text by trapping the output of PDFBox through
>org.apache.pdfbox.util.TextPosition through a subclass of
>org.apache.pdfbox.pdfviewer.PageDrawer. I notice that there are explicit
>characters for spaces (char 32). Sometimes there can be repeated spaces
>and
>even a "paragraph" consisting only of a space. I was unaware that PDF
>supported spaces - are these coming from the original document or are they
>generated in PDFBox from calculations of character spacing and width?
>
>TIA for help.
>
>P.
>
>-- 
>Peter Murray-Rust
>Reader in Molecular Informatics
>Unilever Centre, Dep. Of Chemistry
>University of Cambridge
>CB2 1EW, UK
>+44-1223-763069