You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Frédéric Ravetier <fr...@vikta.com> on 2024/02/12 17:48:10 UTC

How to find coordonnates of word and apply a mask

Hello,

I'd like to find some specific words in a PDF and draw a rectangle over
these words.
I'm using PDFBox 3.0.1

I found this to locate the words :
https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractWordCoordinates.java
As you can see in the println, :
System.out.println(builder.toString() + " [(X=" + boundingBox.getX() +
",Y=" + boundingBox.getY()
                     + ") height=" + boundingBox.getHeight() + " width=" +
boundingBox.getWidth() + "]");

I get :
MYSTRING [(X=29.862407684326172,Y=383.78765869140625)
height=7.098414897918701 width=50.3477668762207 ]

in my prototype I print this information and copy and past x, y, height,
width into a block of code hardcoded

PDPage page = document.getPage(0);
PDPageContentStream contentStream = new PDPageContentStream(document,
page, PDPageContentStream.AppendMode.APPEND, false);
contentStream.setNonStrokingColor(Color.RED);
contentStream.addRect(29.862407684326172f, 383.78765869140625f,
50.3477668762207, 7.098414897918701f);
contentStream.fill();
contentStream.close();
document.save(new FileOutputStream(src_file_path.replace(".pdf", "-rect.pdf")));


But it does not match the text on the PDF.
I tried to replace the height by the font size but it was not really better.

Where is my mistake ?

Best regards,
Fred

Re: How to find coordonnates of word and apply a mask

Posted by Tilman Hausherr <TH...@t-online.de>.
In PDF y=0 is bottom, in java it is top. See also the javadoc of the 
text.getXXX methods. It's a bit tricky, you need to do some trial and error.

Tilman

On 12.02.2024 21:07, Frédéric Ravetier wrote:
> You said Y coordinate is not the same on pdf, this is probably my problem
> on my pdf. How to get the right Y for the pdf?
>
> On my test the x seems OK but not the Y.
>
>
>
> Le lun. 12 févr. 2024, 19:30, Frédéric Ravetier <fr...@vikta.com> a
> écrit :
>
>> My goal is to draw on the same or a copy PDF a rectangle over the text,
>> for example to hide it or to draw a border around the text to show to the
>> user something about this text.
>>
>>
>> Le lun. 12 févr. 2024 à 19:14, Tilman Hausherr <TH...@t-online.de> a
>> écrit :
>>
>>> It depends what you want to get. See the DrawPrintTextLocations.java
>>> example which shows several strategies to get the bounding boxes of
>>> individual glyphs and draw them on the screen (not in a PDF, so the Y
>>> coordinate is different). You would have to adjust the
>>> "Rectangle2D.Float" code to whatever you prefer|, or adjust
>>> |DrawPrintTextLocations to collect words like the mkl code does.
>>>
>>> Tilman
>>>
>>> On 12.02.2024 18:48, Frédéric Ravetier wrote:
>>>> Hello,
>>>>
>>>> I'd like to find some specific words in a PDF and draw a rectangle over
>>>> these words.
>>>> I'm using PDFBox 3.0.1
>>>>
>>>> I found this to locate the words :
>>>>
>>> https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractWordCoordinates.java
>>>> As you can see in the println, :
>>>> System.out.println(builder.toString() + " [(X=" + boundingBox.getX() +
>>>> ",Y=" + boundingBox.getY()
>>>>                        + ") height=" + boundingBox.getHeight() + "
>>> width=" +
>>>> boundingBox.getWidth() + "]");
>>>>
>>>> I get :
>>>> MYSTRING [(X=29.862407684326172,Y=383.78765869140625)
>>>> height=7.098414897918701 width=50.3477668762207 ]
>>>>
>>>> in my prototype I print this information and copy and past x, y, height,
>>>> width into a block of code hardcoded
>>>>
>>>> PDPage page = document.getPage(0);
>>>> PDPageContentStream contentStream = new PDPageContentStream(document,
>>>> page, PDPageContentStream.AppendMode.APPEND, false);
>>>> contentStream.setNonStrokingColor(Color.RED);
>>>> contentStream.addRect(29.862407684326172f, 383.78765869140625f,
>>>> 50.3477668762207, 7.098414897918701f);
>>>> contentStream.fill();
>>>> contentStream.close();
>>>> document.save(new FileOutputStream(src_file_path.replace(".pdf",
>>> "-rect.pdf")));
>>>>
>>>> But it does not match the text on the PDF.
>>>> I tried to replace the height by the font size but it was not really
>>> better.
>>>> Where is my mistake ?
>>>>
>>>> Best regards,
>>>> Fred
>>>>


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: How to find coordonnates of word and apply a mask

Posted by Frédéric Ravetier <fr...@vikta.com>.
You said Y coordinate is not the same on pdf, this is probably my problem
on my pdf. How to get the right Y for the pdf?

On my test the x seems OK but not the Y.



Le lun. 12 févr. 2024, 19:30, Frédéric Ravetier <fr...@vikta.com> a
écrit :

> My goal is to draw on the same or a copy PDF a rectangle over the text,
> for example to hide it or to draw a border around the text to show to the
> user something about this text.
>
>
> Le lun. 12 févr. 2024 à 19:14, Tilman Hausherr <TH...@t-online.de> a
> écrit :
>
>> It depends what you want to get. See the DrawPrintTextLocations.java
>> example which shows several strategies to get the bounding boxes of
>> individual glyphs and draw them on the screen (not in a PDF, so the Y
>> coordinate is different). You would have to adjust the
>> "Rectangle2D.Float" code to whatever you prefer|, or adjust
>> |DrawPrintTextLocations to collect words like the mkl code does.
>>
>> Tilman
>>
>> On 12.02.2024 18:48, Frédéric Ravetier wrote:
>> > Hello,
>> >
>> > I'd like to find some specific words in a PDF and draw a rectangle over
>> > these words.
>> > I'm using PDFBox 3.0.1
>> >
>> > I found this to locate the words :
>> >
>> https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractWordCoordinates.java
>> > As you can see in the println, :
>> > System.out.println(builder.toString() + " [(X=" + boundingBox.getX() +
>> > ",Y=" + boundingBox.getY()
>> >                       + ") height=" + boundingBox.getHeight() + "
>> width=" +
>> > boundingBox.getWidth() + "]");
>> >
>> > I get :
>> > MYSTRING [(X=29.862407684326172,Y=383.78765869140625)
>> > height=7.098414897918701 width=50.3477668762207 ]
>> >
>> > in my prototype I print this information and copy and past x, y, height,
>> > width into a block of code hardcoded
>> >
>> > PDPage page = document.getPage(0);
>> > PDPageContentStream contentStream = new PDPageContentStream(document,
>> > page, PDPageContentStream.AppendMode.APPEND, false);
>> > contentStream.setNonStrokingColor(Color.RED);
>> > contentStream.addRect(29.862407684326172f, 383.78765869140625f,
>> > 50.3477668762207, 7.098414897918701f);
>> > contentStream.fill();
>> > contentStream.close();
>> > document.save(new FileOutputStream(src_file_path.replace(".pdf",
>> "-rect.pdf")));
>> >
>> >
>> > But it does not match the text on the PDF.
>> > I tried to replace the height by the font size but it was not really
>> better.
>> >
>> > Where is my mistake ?
>> >
>> > Best regards,
>> > Fred
>> >
>>
>

Re: How to find coordonnates of word and apply a mask

Posted by Frédéric Ravetier <fr...@vikta.com>.
My goal is to draw on the same or a copy PDF a rectangle over the text, for
example to hide it or to draw a border around the text to show to the user
something about this text.


Le lun. 12 févr. 2024 à 19:14, Tilman Hausherr <TH...@t-online.de> a
écrit :

> It depends what you want to get. See the DrawPrintTextLocations.java
> example which shows several strategies to get the bounding boxes of
> individual glyphs and draw them on the screen (not in a PDF, so the Y
> coordinate is different). You would have to adjust the
> "Rectangle2D.Float" code to whatever you prefer|, or adjust
> |DrawPrintTextLocations to collect words like the mkl code does.
>
> Tilman
>
> On 12.02.2024 18:48, Frédéric Ravetier wrote:
> > Hello,
> >
> > I'd like to find some specific words in a PDF and draw a rectangle over
> > these words.
> > I'm using PDFBox 3.0.1
> >
> > I found this to locate the words :
> >
> https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractWordCoordinates.java
> > As you can see in the println, :
> > System.out.println(builder.toString() + " [(X=" + boundingBox.getX() +
> > ",Y=" + boundingBox.getY()
> >                       + ") height=" + boundingBox.getHeight() + "
> width=" +
> > boundingBox.getWidth() + "]");
> >
> > I get :
> > MYSTRING [(X=29.862407684326172,Y=383.78765869140625)
> > height=7.098414897918701 width=50.3477668762207 ]
> >
> > in my prototype I print this information and copy and past x, y, height,
> > width into a block of code hardcoded
> >
> > PDPage page = document.getPage(0);
> > PDPageContentStream contentStream = new PDPageContentStream(document,
> > page, PDPageContentStream.AppendMode.APPEND, false);
> > contentStream.setNonStrokingColor(Color.RED);
> > contentStream.addRect(29.862407684326172f, 383.78765869140625f,
> > 50.3477668762207, 7.098414897918701f);
> > contentStream.fill();
> > contentStream.close();
> > document.save(new FileOutputStream(src_file_path.replace(".pdf",
> "-rect.pdf")));
> >
> >
> > But it does not match the text on the PDF.
> > I tried to replace the height by the font size but it was not really
> better.
> >
> > Where is my mistake ?
> >
> > Best regards,
> > Fred
> >
>

Re: How to find coordonnates of word and apply a mask

Posted by Tilman Hausherr <TH...@t-online.de>.
It depends what you want to get. See the DrawPrintTextLocations.java 
example which shows several strategies to get the bounding boxes of 
individual glyphs and draw them on the screen (not in a PDF, so the Y 
coordinate is different). You would have to adjust the 
"Rectangle2D.Float" code to whatever you prefer|, or adjust 
|DrawPrintTextLocations to collect words like the mkl code does.

Tilman

On 12.02.2024 18:48, Frédéric Ravetier wrote:
> Hello,
>
> I'd like to find some specific words in a PDF and draw a rectangle over
> these words.
> I'm using PDFBox 3.0.1
>
> I found this to locate the words :
> https://github.com/mkl-public/testarea-pdfbox2/blob/master/src/test/java/mkl/testarea/pdfbox2/extract/ExtractWordCoordinates.java
> As you can see in the println, :
> System.out.println(builder.toString() + " [(X=" + boundingBox.getX() +
> ",Y=" + boundingBox.getY()
>                       + ") height=" + boundingBox.getHeight() + " width=" +
> boundingBox.getWidth() + "]");
>
> I get :
> MYSTRING [(X=29.862407684326172,Y=383.78765869140625)
> height=7.098414897918701 width=50.3477668762207 ]
>
> in my prototype I print this information and copy and past x, y, height,
> width into a block of code hardcoded
>
> PDPage page = document.getPage(0);
> PDPageContentStream contentStream = new PDPageContentStream(document,
> page, PDPageContentStream.AppendMode.APPEND, false);
> contentStream.setNonStrokingColor(Color.RED);
> contentStream.addRect(29.862407684326172f, 383.78765869140625f,
> 50.3477668762207, 7.098414897918701f);
> contentStream.fill();
> contentStream.close();
> document.save(new FileOutputStream(src_file_path.replace(".pdf", "-rect.pdf")));
>
>
> But it does not match the text on the PDF.
> I tried to replace the height by the font size but it was not really better.
>
> Where is my mistake ?
>
> Best regards,
> Fred
>