You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Slava G <sl...@gmail.com> on 2020/01/22 09:27:48 UTC

Incorrect text extraction of the PDF

Hi,
I have PDF, which is looks fine in readers but when I trying to extract
text I get garbage.
What am I doing wrong ?
PDF is attached.
Thanks

Re: Incorrect text extraction of the PDF

Posted by Slava G <sl...@gmail.com>.
Thanks Maruan,
I got the explanation.
Slava

On Wed, Jan 22, 2020 at 12:18 PM Maruan Sahyoun <sa...@fileaffairs.de>
wrote:

> Hi,
>
> please take a look at the FAQ at
>
> https://pdfbox.apache.org/2.0/faq.html#how-come-i-am-getting-gibberishg38g43g36g51g5-when-extracting-text
>
> BR
> Maruan
>
> > Hi,
> > I have PDF, which is looks fine in readers but when I trying to extract
> text I get garbage.
> > What am I doing wrong ?
> > PDF is attached.
> > Thanks
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> > For additional commands, e-mail: users-help@pdfbox.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org
>
>

Re: Incorrect text extraction of the PDF

Posted by Maruan Sahyoun <sa...@fileaffairs.de>.
Hi,

please take a look at the FAQ at 
https://pdfbox.apache.org/2.0/faq.html#how-come-i-am-getting-gibberishg38g43g36g51g5-when-extracting-text

BR
Maruan
 
> Hi,
> I have PDF, which is looks fine in readers but when I trying to extract text I get garbage.
> What am I doing wrong ?
> PDF is attached.
> Thanks
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: users-help@pdfbox.apache.org


Re: Incorrect text extraction of the PDF

Posted by Gilad Denneboom <gi...@gmail.com>.
You can't attach files here directly. Upload it to a file-sharing website
(Dropbox, Google Drive, etc.) and then post a link to it.

On Wed, Jan 22, 2020 at 10:28 AM Slava G <sl...@gmail.com> wrote:

> Hi,
> I have PDF, which is looks fine in readers but when I trying to extract
> text I get garbage.
> What am I doing wrong ?
> PDF is attached.
> Thanks
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: users-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: users-help@pdfbox.apache.org