You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tika.apache.org by Apache Wiki <wi...@apache.org> on 2015/08/19 19:36:22 UTC

[Tika Wiki] Update of "GrobidJournalParser" by NickBurch

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change notification.

The "GrobidJournalParser" page has been changed by NickBurch:
https://wiki.apache.org/tika/GrobidJournalParser?action=diff&rev1=7&rev2=8

Comment:
Add a note on binaries, and where to track the progress

  The GrobidJournalParser uses the [[http://grobid.readthedocs.org/en/latest/Introduction/|GROBID (or Grobid) GeneRation Of BIbliographic Data]] machine learning framework to parse PDF files and to extract information such as  title, abstract, authors, affiliations, keywords, etc, from journal publications. The parser has been integrated into Tika. You can follow this guide to get it working on your system.
  
  == Installing GROBID ==
+ Currently, to install GROBID, it's necessary to start from the source code. We are currently working with the GROBID community to get pre-build binaries into Maven central, which is being tracked with [[https://github.com/kermitt2/grobid/issues/59|issue #59]]. For now, a git checkout of head is recommended, as detailed here.
  
  You should be able to install GROBID from a Git checkout such as the below.