You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2013/10/06 16:17:42 UTC

[jira] [Commented] (PDFBOX-1709) processEncodedText gives x-coord short by width of previous text, for next text at same y-coord.

    [ https://issues.apache.org/jira/browse/PDFBOX-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13787614#comment-13787614 ] 

Andreas Lehmkühler commented on PDFBOX-1709:
--------------------------------------------

I checked all 3 cases. Here are the facts. All 3 pdfs are similar. The one and only difference is the test showing operator:

*case 0:*

(Hello world.)Tj -> print the text as is

PrintTextLocations output:

[100.0,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=14.440002]H
[114.44,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=11.120003]e
[125.560005,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.439995]l
[130.0,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.4400024]l
[134.44,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=11.119995]o
[145.56,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=5.5599976] 
[151.12,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=14.440002]w
[165.56,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=11.119995]o
[176.68,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=6.6600037]r
[183.34,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.4400024]l
[187.78,720.0 fs=20.0 xscale=20.0 height=14.660001 space=5.5600004 width=11.119995]d
[198.9,720.0 fs=20.0 xscale=20.0 height=14.660001 space=5.5600004 width=5.5599976].


*case 1:*

[(Hello)-277.996(world.)]TJ -> print "hello", add -(-277.996)/1000*20 = 5.56 to the x-coord and print "world."

PrintTextLocations output:

[100.0,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=14.440002]H
[114.44,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=11.120003]e
[125.560005,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.439995]l
[130.0,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.4400024]l
[134.44,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=11.119995]o
[151.11992,720.0 fs=20.0 xscale=20.0 height=10.46 space=5.5600004 width=14.440002]w
[165.55992,720.0 fs=20.0 xscale=20.0 height=11.040001 space=5.5600004 width=11.119995]o
[176.67992,720.0 fs=20.0 xscale=20.0 height=11.040001 space=5.5600004 width=6.6600037]r
[183.33992,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.4400024]l
[187.77992,720.0 fs=20.0 xscale=20.0 height=14.660001 space=5.5600004 width=11.119995]d
[198.89992,720.0 fs=20.0 xscale=20.0 height=14.660001 space=5.5600004 width=5.5599976].

*case 2:*

[(Hello)-5000(world.)]TJ  -> print "hello", add -(-5000)/1000*20 = 100 to the x-coord and print "world."

PrintTextLocations output:

[100.0,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=14.440002]H
[114.44,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=11.120003]e
[125.560005,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.439995]l
[130.0,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.4400024]l
[134.44,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=11.119995]o
[245.56,720.0 fs=20.0 xscale=20.0 height=10.46 space=5.5600004 width=14.440002]w
[260.0,720.0 fs=20.0 xscale=20.0 height=11.040001 space=5.5600004 width=11.119995]o
[271.12,720.0 fs=20.0 xscale=20.0 height=11.040001 space=5.5600004 width=6.6600037]r
[277.78,720.0 fs=20.0 xscale=20.0 height=14.360001 space=5.5600004 width=4.4400024]l
[282.22,720.0 fs=20.0 xscale=20.0 height=14.660001 space=5.5600004 width=11.119995]d
[293.34,720.0 fs=20.0 xscale=20.0 height=14.660001 space=5.5600004 width=5.5599976].

All coords are as expected or did I miss something?


> processEncodedText gives x-coord short by width of previous text, for next text at same y-coord.
> ------------------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-1709
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1709
>             Project: PDFBox
>          Issue Type: Bug
>          Components: PDModel
>    Affects Versions: 1.8.2
>         Environment: Windows 7 sp1, Javac 1.6.0_30, Java 1.7.0_17
>            Reporter: Robert Simms
>              Labels: test
>         Attachments: PDFBOX1709-0.pdf, PDFBOX1709-1.pdf, PDFBOX1709-2.pdf
>
>
> Use this PostScript to create PDFs that demonstrate x-coordinate issue with processEncodedText().
> %!
> /Helvetica findfont 20 scalefont setfont
> 100 72 moveto
> (Hello) show
> % CASES
> %    Uncomment any one of the following, make a PDF (with ghostscript ps2pdf, or acrobat distiller),
> %    then process the PDF with java implementation of PDFBox PDFTextStripper.
> %    listing text and x,y positions obtained by overriding the processEncodedText() method.
> %    For example, the x-coord. of a text item may be printed in that method with
> %       System.out.format("%.2f\n", this.getTextMatrix().getXPosition());
> % % 0. Works to convince processEncodedText that string 'Hello world.' was at 100,72.  This is good.
> %
> % ( world.) show
> % % 1. Does not trick processEncodedText into thinking 'Hello' followed by ' ' + 'world.'
> % %    Instead,
> % %    x-coord. of 'world.' reported as being actual position minus width of 'Hello', plus width of ' '
> % %    which is x=105.56 in this case.
> % 
> %( ) stringwidth pop 0 rmoveto
> %(world.) show
> % % 2. Positioning 'world.' within about 500 points from 'Hello', at same vertical position causes
> % %    processEncodedText to give
> % %    x-coord. of 'world.' as actual position minus width of 'Hello'
> % %    which is x=200 in this case.
> %
> %100 0 rmoveto
> %(world.) show
> showpage



--
This message was sent by Atlassian JIRA
(v6.1#6144)