You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Shubham Goswami <sh...@hotwax.co> on 2019/11/07 09:32:10 UTC

PDFBox library

Hello Community

I am using Apache PDFBox for the first time in which i am trying to extract
texts from pdf of some invoice bills with getText() method of
PDFTextStripper class but some documents it is reading from left to right
else some from right to left infact in the same document its reading
different values in different ways.

I want to know if there is any other way to get the data in proper way or
we can get selected data else can we arrange the data manually based on
requirement?
Any help will be appreciated.Thanks in advance.

-- 
Kind Regards,
Shubham Goswami
Enterprise Software Engineer
mobile: +91 7803886288
email: shubham.goswami@hotwax.co <ar...@hotwax.co>
*www.hotwax.co <http://www.hotwax.co/>*

Re: PDFBox library

Posted by Gilad Denneboom <gi...@gmail.com>.
Use the PDFTextStripperByArea class.

On Thu, Nov 7, 2019 at 1:08 PM Shubham Goswami <sh...@hotwax.co>
wrote:

> Hello Community
>
> Please let me know if there is any way to get text from particular area of
> PDF files
> as it is showing in PDF document or else we can do any formatting on the
> text.
> I want to get the selected data from document but not all data and in
> different sequence.
> Thanks in advance.
>
> On Thu, Nov 7, 2019 at 3:02 PM Shubham Goswami <sh...@hotwax.co>
> wrote:
>
> > Hello Community
> >
> > I am using Apache PDFBox for the first time in which i am trying to
> > extract texts from pdf of some invoice bills with getText() method of
> > PDFTextStripper class but some documents it is reading from left to right
> > else some from right to left infact in the same document its reading
> > different values in different ways.
> >
> > I want to know if there is any other way to get the data in proper way or
> > we can get selected data else can we arrange the data manually based on
> > requirement?
> > Any help will be appreciated.Thanks in advance.
> >
> > --
> > Kind Regards,
> > Shubham Goswami
> > Enterprise Software Engineer
> > mobile: +91 7803886288
> > email: shubham.goswami@hotwax.co <ar...@hotwax.co>
> > *www.hotwax.co <http://www.hotwax.co/>*
> >
>
>
> --
> Kind Regards,
> Shubham Goswami
> Enterprise Software Engineer
> mobile: +91 7803886288
> email: shubham.goswami@hotwax.co <ar...@hotwax.co>
> *www.hotwax.co <http://www.hotwax.co/>*
> _________________________
>

Re: PDFBox library

Posted by Shubham Goswami <sh...@hotwax.co>.
Hello Community

Please let me know if there is any way to get text from particular area of
PDF files
as it is showing in PDF document or else we can do any formatting on the
text.
I want to get the selected data from document but not all data and in
different sequence.
Thanks in advance.

On Thu, Nov 7, 2019 at 3:02 PM Shubham Goswami <sh...@hotwax.co>
wrote:

> Hello Community
>
> I am using Apache PDFBox for the first time in which i am trying to
> extract texts from pdf of some invoice bills with getText() method of
> PDFTextStripper class but some documents it is reading from left to right
> else some from right to left infact in the same document its reading
> different values in different ways.
>
> I want to know if there is any other way to get the data in proper way or
> we can get selected data else can we arrange the data manually based on
> requirement?
> Any help will be appreciated.Thanks in advance.
>
> --
> Kind Regards,
> Shubham Goswami
> Enterprise Software Engineer
> mobile: +91 7803886288
> email: shubham.goswami@hotwax.co <ar...@hotwax.co>
> *www.hotwax.co <http://www.hotwax.co/>*
>


-- 
Kind Regards,
Shubham Goswami
Enterprise Software Engineer
mobile: +91 7803886288
email: shubham.goswami@hotwax.co <ar...@hotwax.co>
*www.hotwax.co <http://www.hotwax.co/>*
_________________________

AW: PDFBox library

Posted by Tilman Hausherr <TH...@t-online.de>.
please try sort:

https://pdfbox.apache.org/docs/2.0.7/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setSortByPosition(boolean)
<https://pdfbox.apache.org/docs/2.0.7/javadocs/org/apache/pdfbox/text/PDFTextStripper.html#setSortByPosition(boolean)>


------------------------------------------------------------------------
Gesendet mit der Telekom Mail App
<https://kommunikationsdienste.t-online.de/redirects/email_app_android_sendmail_footer>



--- Original-Nachricht ---
Von: Shubham Goswami
Betreff: PDFBox library
Datum: 07.11.2019, 10:32 Uhr
An: users@pdfbox.apache.org




Hello Community

I am using Apache PDFBox for the first time in which i am trying to extract
texts from pdf of some invoice bills with getText() method of
PDFTextStripper class but some documents it is reading from left to right
else some from right to left infact in the same document its reading
different values in different ways.

I want to know if there is any other way to get the data in proper way or
we can get selected data else can we arrange the data manually based on
requirement?
Any help will be appreciated.Thanks in advance.

--
Kind Regards,
Shubham Goswami
Enterprise Software Engineer
mobile: +91 7803886288
email: shubham.goswami@hotwax.co <ma...@hotwax.co> <
arunava.acharjee@hotwax.co <ma...@hotwax.co> >
*www.hotwax.co <http://www.hotwax.co> <http://www.hotwax.co/>
<http://www.hotwax.co/>> ;*