You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Olivier Grisel (JIRA)" <ji...@apache.org> on 2011/07/04 14:04:21 UTC
[jira] [Updated] (OPENNLP-211) Add a Wikinews parser to the
wikinews-importer
[ https://issues.apache.org/jira/browse/OPENNLP-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olivier Grisel updated OPENNLP-211:
-----------------------------------
Attachment: TestWikipediaParsing.java
ParsingWikipediaLoader.java
Annotation.java
AnnotatingMarkupParser.java
Please feel free to reuse and adapt the following classes from the pignlproc project.
The meat is in AnnotatingMarkupParser.java / Annotation.java,
Sample usage (in a pig context, just as a reference) in ParsingWikipediaLoader.java
Some tests fo AnnotatingMarkupParser available in TestWikipediaParsing.java.
The main class is based on the following dependency:
https://code.google.com/p/gwtwiki/ (license under EPL 1.0 hence compatible with the ASF rules if distributed only in binary format, e.g through maven):
<dependency>
<groupId>info.bliki.wiki</groupId>
<artifactId>bliki-core</artifactId>
<version>3.0.16</version>
</dependency>
Note: the gwtwiki project now feature a new API with Helpers dedicated to MediaWiki markup dump parsing here:
https://code.google.com/p/gwtwiki/wiki/MediaWikiDumpSupport
IIRC those helpers were not available when I started the pignlproc tools. Might be useful to investigate directly too.
> Add a Wikinews parser to the wikinews-importer
> ----------------------------------------------
>
> Key: OPENNLP-211
> URL: https://issues.apache.org/jira/browse/OPENNLP-211
> Project: OpenNLP
> Issue Type: Task
> Reporter: Jörn Kottmann
> Attachments: AnnotatingMarkupParser.java, Annotation.java, ParsingWikipediaLoader.java, TestWikipediaParsing.java
>
>
> The current wikinews-importer can only load existing XMI files, that should be fixed by adding a proper wikinews parser wich can turn the wikinews dump into UIMA CASes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira