You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Daniel Gibby <dg...@edirectpublishing.com> on 2013/11/18 21:10:20 UTC

PDF project

Is the PDF conversion a part of a separate project like the MS word 
document conversion is?

I recently helped find and test a bug fix for a .docx conversion 
problem, and PDF conversion has various issues I'd like to help with as 
well.

Is the PDF conversion code all within the main Tika project, or is it 
separate?

Thanks!

Re: PDF project

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 18 Nov 2013, Daniel Gibby wrote:
> Is the PDF conversion a part of a separate project like the MS word 
> document conversion is?

Yup, very similar to the word stuff. Tika uses Apache PDFBox with custom 
Tika code to call it in the right way. Therefore, some fixes will be 
direct to Tika, while others will need upstream fixes in PDFBox

Nick