You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by Sireesha Chilakamarri <si...@gmail.com> on 2014/04/03 00:32:27 UTC

PDFTextPositions

Hi,

I would like to Search and Obtain Text Position (X/Y/Width/height) for the
searched Text.

Suppose text "Hello_World" appears at different location and on different
pages on the PDF document, I would like to see its X/Y/Width/Height for
every occurence.

How do I achieve this?

Thank you,
Sireesha

Re: PDFTextPositions

Posted by Sireesha Chilakamarri <si...@gmail.com>.
Thank you Alin. Appreciate your response.
I
f you can help with a sample code  - if you are free sometime, maybe I get
a better idea of your explanation.

Sireesha



On Wed, Apr 2, 2014 at 4:31 PM, Alin Mazilu <im...@gmail.com> wrote:

> Not that I know of. PDFBox provides mostly low level access to the PDF
> format. The only relatively easy way to do it would be keep the
> TextPosition objects and also grab the text output of the PDFTextStripper.
> Then you can search the output (a String) for the position of the word you
> are looking for and get the position in the PDF Page from the corresponding
> TextPosition objects. Other than that... I can think of other ways but
> would take longer to implement. Sorry, I would write a sample, but I'm not
> at my desk right now.
>
> Alin
>
>
> On Wed, Apr 2, 2014 at 7:01 PM, Sireesha Chilakamarri <
> sireesha.charyulu@gmail.com> wrote:
>
> > Hi Allin,
> >
> > I am able to run the PrintTextLocations example. This gives me the
> > locations details for every characters.
> >
> > Is there a easier way to get coordinates for a Word as a whole, instead
> of
> > all its characters?
> >
> > To Search for Text, I used a method prescribed in
> >
> >
> http://www.programming-free.com/2012/11/simple-word-search-in-pdf-files-using.html
> > .
> >
> > Is there a easier way to Search for Text as well?
> >
> > Are there no direct APIs?
> >
> > Thank you,
> > Sireesha
> >
> >
> > On Wed, Apr 2, 2014 at 3:55 PM, Alin Mazilu <im...@gmail.com> wrote:
> >
> > > You have to extend the PDFTextStripper class and override the
> > > processTextPosition(...) method. From there the logic depends on you.
> You
> > > can also override the writePage() method to grab the
> charactersByArticle
> > > Vector and then you would look for your words in there by iterating
> over
> > > it. Basically in both cases you will grab all TextPosition objects and
> > > figure out your position and height/width form there.
> > >
> > > ~Alin
> > >
> > >
> > > On Wed, Apr 2, 2014 at 6:32 PM, Sireesha Chilakamarri <
> > > sireesha.charyulu@gmail.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > I would like to Search and Obtain Text Position (X/Y/Width/height)
> for
> > > the
> > > > searched Text.
> > > >
> > > > Suppose text "Hello_World" appears at different location and on
> > different
> > > > pages on the PDF document, I would like to see its X/Y/Width/Height
> for
> > > > every occurence.
> > > >
> > > > How do I achieve this?
> > > >
> > > > Thank you,
> > > > Sireesha
> > > >
> > >
> >
>

Re: PDFTextPositions

Posted by Alin Mazilu <im...@gmail.com>.
Not that I know of. PDFBox provides mostly low level access to the PDF
format. The only relatively easy way to do it would be keep the
TextPosition objects and also grab the text output of the PDFTextStripper.
Then you can search the output (a String) for the position of the word you
are looking for and get the position in the PDF Page from the corresponding
TextPosition objects. Other than that... I can think of other ways but
would take longer to implement. Sorry, I would write a sample, but I'm not
at my desk right now.

Alin


On Wed, Apr 2, 2014 at 7:01 PM, Sireesha Chilakamarri <
sireesha.charyulu@gmail.com> wrote:

> Hi Allin,
>
> I am able to run the PrintTextLocations example. This gives me the
> locations details for every characters.
>
> Is there a easier way to get coordinates for a Word as a whole, instead of
> all its characters?
>
> To Search for Text, I used a method prescribed in
>
> http://www.programming-free.com/2012/11/simple-word-search-in-pdf-files-using.html
> .
>
> Is there a easier way to Search for Text as well?
>
> Are there no direct APIs?
>
> Thank you,
> Sireesha
>
>
> On Wed, Apr 2, 2014 at 3:55 PM, Alin Mazilu <im...@gmail.com> wrote:
>
> > You have to extend the PDFTextStripper class and override the
> > processTextPosition(...) method. From there the logic depends on you. You
> > can also override the writePage() method to grab the charactersByArticle
> > Vector and then you would look for your words in there by iterating over
> > it. Basically in both cases you will grab all TextPosition objects and
> > figure out your position and height/width form there.
> >
> > ~Alin
> >
> >
> > On Wed, Apr 2, 2014 at 6:32 PM, Sireesha Chilakamarri <
> > sireesha.charyulu@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I would like to Search and Obtain Text Position (X/Y/Width/height) for
> > the
> > > searched Text.
> > >
> > > Suppose text "Hello_World" appears at different location and on
> different
> > > pages on the PDF document, I would like to see its X/Y/Width/Height for
> > > every occurence.
> > >
> > > How do I achieve this?
> > >
> > > Thank you,
> > > Sireesha
> > >
> >
>

Re: PDFTextPositions

Posted by Sireesha Chilakamarri <si...@gmail.com>.
Hi Allin,

I am able to run the PrintTextLocations example. This gives me the
locations details for every characters.

Is there a easier way to get coordinates for a Word as a whole, instead of
all its characters?

To Search for Text, I used a method prescribed in
http://www.programming-free.com/2012/11/simple-word-search-in-pdf-files-using.html.

Is there a easier way to Search for Text as well?

Are there no direct APIs?

Thank you,
Sireesha


On Wed, Apr 2, 2014 at 3:55 PM, Alin Mazilu <im...@gmail.com> wrote:

> You have to extend the PDFTextStripper class and override the
> processTextPosition(...) method. From there the logic depends on you. You
> can also override the writePage() method to grab the charactersByArticle
> Vector and then you would look for your words in there by iterating over
> it. Basically in both cases you will grab all TextPosition objects and
> figure out your position and height/width form there.
>
> ~Alin
>
>
> On Wed, Apr 2, 2014 at 6:32 PM, Sireesha Chilakamarri <
> sireesha.charyulu@gmail.com> wrote:
>
> > Hi,
> >
> > I would like to Search and Obtain Text Position (X/Y/Width/height) for
> the
> > searched Text.
> >
> > Suppose text "Hello_World" appears at different location and on different
> > pages on the PDF document, I would like to see its X/Y/Width/Height for
> > every occurence.
> >
> > How do I achieve this?
> >
> > Thank you,
> > Sireesha
> >
>

Re: PDFTextPositions

Posted by Alin Mazilu <im...@gmail.com>.
You have to extend the PDFTextStripper class and override the
processTextPosition(...) method. From there the logic depends on you. You
can also override the writePage() method to grab the charactersByArticle
Vector and then you would look for your words in there by iterating over
it. Basically in both cases you will grab all TextPosition objects and
figure out your position and height/width form there.

~Alin


On Wed, Apr 2, 2014 at 6:32 PM, Sireesha Chilakamarri <
sireesha.charyulu@gmail.com> wrote:

> Hi,
>
> I would like to Search and Obtain Text Position (X/Y/Width/height) for the
> searched Text.
>
> Suppose text "Hello_World" appears at different location and on different
> pages on the PDF document, I would like to see its X/Y/Width/Height for
> every occurence.
>
> How do I achieve this?
>
> Thank you,
> Sireesha
>