You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by Apache Wiki <wi...@apache.org> on 2013/12/02 16:00:53 UTC
[Lucene-java Wiki] Update of "PDF" by SteveRowe
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.
The "PDF" page has been changed by SteveRowe:
https://wiki.apache.org/lucene-java/PDF?action=diff&rev1=2&rev2=3
Comment:
fix pdfbox link
== Extracting text from a PDF document ==
In the event that you are going to index the content of a PDF, a good place to look first is a Java library called PDFBox
- http://www.pdfbox.org/userguide/text_extraction.html
+ http://pdfbox.apache.org/cookbook/textextraction.html