You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@pdfbox.apache.org by Raval Saurabh <sa...@yahoo.com> on 2010/03/09 21:54:18 UTC

PDFToImage does not work with microsoft created pdf

I am using the PDFToImage which is working for basic PDF documents. 

However, I am trying to use PDFToImage to specifically create image files for PDF's created from MS Office, and that seems to be the problem. It creates image files with whole bunch of '?' characters. So, I thought I would need to install MS fonts on my ubuntu machine, but that did not fix the problem.

Do I need to do anything in particular to get this to work?

I am attaching the PDF file I am trying to convert.

Thanks,
-Saurabh Raval

Re: PDFToImage does not work with microsoft created pdf

Posted by steve poling <sd...@i2k.com>.
'?' characters usually have to do with the character set. MS Word loves 
to take the straight quote and double-quote and replace them with curved 
quotes, aka smart quotes. These are characters beyond the normal ASCII 
character set and when they aren't in whatever character set is being 
used by the renderer you either get '?' or a square box.

Daniel Wilson wrote:
> '?' characters usually do indicate a font problem.  We do have several
> outstanding issues, especially with non-Latin character sets.
>
> What can you tell us about what characters / fonts you're using?
>
> Thanks.
>
> Daniel
>
> On Tue, Mar 9, 2010 at 3:54 PM, Raval Saurabh <sa...@yahoo.com>wrote:
>
>   
>> I am using the PDFToImage which is working for basic PDF documents.
>>
>> However, I am trying to use PDFToImage to specifically create image files
>> for PDF's created from MS Office, and that seems to be the problem. It
>> creates image files with whole bunch of '?' characters. So, I thought I
>> would need to install MS fonts on my ubuntu machine, but that did not fix
>> the problem.
>>
>> Do I need to do anything in particular to get this to work?
>>
>> I am attaching the PDF file I am trying to convert.
>>
>> Thanks,
>> -Saurabh Raval
>>
>>     
>
>   

Re: PDFToImage does not work with microsoft created pdf

Posted by Daniel Wilson <wi...@gmail.com>.
'?' characters usually do indicate a font problem.  We do have several
outstanding issues, especially with non-Latin character sets.

What can you tell us about what characters / fonts you're using?

Thanks.

Daniel

On Tue, Mar 9, 2010 at 3:54 PM, Raval Saurabh <sa...@yahoo.com>wrote:

> I am using the PDFToImage which is working for basic PDF documents.
>
> However, I am trying to use PDFToImage to specifically create image files
> for PDF's created from MS Office, and that seems to be the problem. It
> creates image files with whole bunch of '?' characters. So, I thought I
> would need to install MS fonts on my ubuntu machine, but that did not fix
> the problem.
>
> Do I need to do anything in particular to get this to work?
>
> I am attaching the PDF file I am trying to convert.
>
> Thanks,
> -Saurabh Raval
>