You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Omar Chiyean <om...@gmail.com> on 2010/02/19 21:10:24 UTC

Help wiht PDFBox

Is there a way to see how is character encoded the pdf that is being
stripped??

Is there so, can you help me, telling me how to do it??

Thanks alot!!!

Re: Help wiht PDFBox

Posted by Omar Chiyean <om...@gmail.com>.
Hi..
I used your code, but i receive somethings like these:

org.apache.encoding.MacRomanEncoding@22d3

and this information is doesn't say anything about the encodign of the pdf.
Because MacRoman is the default charset in java mac.

I want to know the encoding, because, there are some signs and accents
that are not well processed witn pdfbox1.0.0

Thanks in advance...

2010/2/20 Villu Ruusmann <vi...@gmail.com>

> Hello there,
>
> On Fri, Feb 19, 2010 at 10:10 PM, Omar Chiyean <om...@gmail.com>
> wrote:
> > Is there a w
>
ay to see how is character encoded the pdf that is being
> > stripped??
> >
>
> PDFTextStripper textStripper = new PDFTextStripper(){
>
>    @Override
>    public void processTextPosition(TextPosition text){
>        super.processTextPosition(text);
>
>        PDFont font = text.getFont();
>
>        Encoding fontEncoding = null;
>        try {
>            fontEncoding = font.getEncoding();
>        } catch(IOException ioe){
>            // Ignored
>        }
>
>        System.out.println(text.getCharacter() + " (font= " +font+ ",
> fontEncoding=" +fontEncoding+ ")");
>    }
> };
>

Re: Help wiht PDFBox

Posted by Villu Ruusmann <vi...@gmail.com>.
Hello there,

On Fri, Feb 19, 2010 at 10:10 PM, Omar Chiyean <om...@gmail.com> wrote:
> Is there a way to see how is character encoded the pdf that is being
> stripped??
>

PDFTextStripper textStripper = new PDFTextStripper(){

    @Override
    public void processTextPosition(TextPosition text){
        super.processTextPosition(text);

        PDFont font = text.getFont();

        Encoding fontEncoding = null;
        try {
            fontEncoding = font.getEncoding();
        } catch(IOException ioe){
            // Ignored
        }

        System.out.println(text.getCharacter() + " (font= " +font+ ",
fontEncoding=" +fontEncoding+ ")");
    }
};