You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2016/09/15 16:50:31 UTC

[Tika Wiki] Update of "Troubleshooting Tika" by TimothyAllison

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "Troubleshooting Tika" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/Troubleshooting%20Tika?action=diff&rev1=10&rev2=11

  == PDF Text Problems ==
  If Tika isn't extracting the right text from a PDF, and/or is giving errors, the first thing to do is identify if this is a Tika issue, or an issue with the underlying Apache PDFBox library used.
  
- To check, grab the latest [[http://pdfbox.apache.org/download.cgi|Apache PDFBox pdfbox-app jar]] and use the [[http://pdfbox.apache.org/2.0/commandline.html#extracttext|ExtractText command line tool]] on your problematic PDF. 
+ To check, grab the latest [[http://pdfbox.apache.org/download.cgi|Apache PDFBox pdfbox-app jar]] and use the [[http://pdfbox.apache.org/2.0/commandline.html#extracttext|ExtractText command line tool]] on your problematic PDF:
+ {{{
+ java -jar pdfbox-app.X.Y.jar ExtractText problematicPDF.pdf
+ }}}
  
  If that shows the same problem, it's a PDFBox bug. Please [[http://pdfbox.apache.org/support.html|file an Apache PDFBox bug report]] and attach at least one failing file to the bug. When that gets fixed, Tika will pick up the new release and will get the fix