You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Daniel Gibby <dg...@edirectpublishing.com> on 2013/11/18 21:10:20 UTC
PDF project
Is the PDF conversion a part of a separate project like the MS word
document conversion is?
I recently helped find and test a bug fix for a .docx conversion
problem, and PDF conversion has various issues I'd like to help with as
well.
Is the PDF conversion code all within the main Tika project, or is it
separate?
Thanks!
Re: PDF project
Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 18 Nov 2013, Daniel Gibby wrote:
> Is the PDF conversion a part of a separate project like the MS word
> document conversion is?
Yup, very similar to the word stuff. Tika uses Apache PDFBox with custom
Tika code to call it in the right way. Therefore, some fixes will be
direct to Tika, while others will need upstream fixes in PDFBox
Nick