You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/07/04 12:31:27 UTC
[jira] [Created] (OPENNLP-211) Add a Wikinews parser to the
wikinews-importer
Add a Wikinews parser to the wikinews-importer
----------------------------------------------
Key: OPENNLP-211
URL: https://issues.apache.org/jira/browse/OPENNLP-211
Project: OpenNLP
Issue Type: Task
Reporter: Jörn Kottmann
The current wikinews-importer can only load existing XMI files, that should be fixed by adding a proper wikinews parser wich can turn the wikinews dump into UIMA CASes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Closed] (OPENNLP-211) Add a Wikinews parser to the
wikinews-importer
Posted by "Joern Kottmann (Closed) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joern Kottmann closed OPENNLP-211.
----------------------------------
Resolution: Fixed
Assignee: Joern Kottmann
> Add a Wikinews parser to the wikinews-importer
> ----------------------------------------------
>
> Key: OPENNLP-211
> URL: https://issues.apache.org/jira/browse/OPENNLP-211
> Project: OpenNLP
> Issue Type: Task
> Reporter: Joern Kottmann
> Assignee: Joern Kottmann
> Attachments: AnnotatingMarkupParser.java, Annotation.java, ParsingWikipediaLoader.java, TestWikipediaParsing.java
>
>
> The current wikinews-importer can only load existing XMI files, that should be fixed by adding a proper wikinews parser wich can turn the wikinews dump into UIMA CASes.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (OPENNLP-211) Add a Wikinews parser
to the wikinews-importer
Posted by "Olivier Grisel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059403#comment-13059403 ]
Olivier Grisel edited comment on OPENNLP-211 at 7/4/11 12:06 PM:
-----------------------------------------------------------------
Please feel free to reuse and adapt the following classes from the pignlproc project.
The meat is in AnnotatingMarkupParser.java / Annotation.java,
Sample usage (in a pig context, just as a reference) in ParsingWikipediaLoader.java
Some tests for AnnotatingMarkupParser available in TestWikipediaParsing.java.
The main class is based on the following dependency:
https://code.google.com/p/gwtwiki/ (licensed under EPL 1.0 hence compatible with the ASF rules if distributed only in binary format, e.g through maven):
<dependency>
<groupId>info.bliki.wiki</groupId>
<artifactId>bliki-core</artifactId>
<version>3.0.16</version>
</dependency>
Note: the gwtwiki project now feature a new API with Helpers dedicated to MediaWiki markup dump parsing here:
https://code.google.com/p/gwtwiki/wiki/MediaWikiDumpSupport
IIRC those helpers were not available when I started the pignlproc tools. Might be useful to investigate directly too.
was (Author: ogrisel):
Please feel free to reuse and adapt the following classes from the pignlproc project.
The meat is in AnnotatingMarkupParser.java / Annotation.java,
Sample usage (in a pig context, just as a reference) in ParsingWikipediaLoader.java
Some tests fo AnnotatingMarkupParser available in TestWikipediaParsing.java.
The main class is based on the following dependency:
https://code.google.com/p/gwtwiki/ (license under EPL 1.0 hence compatible with the ASF rules if distributed only in binary format, e.g through maven):
<dependency>
<groupId>info.bliki.wiki</groupId>
<artifactId>bliki-core</artifactId>
<version>3.0.16</version>
</dependency>
Note: the gwtwiki project now feature a new API with Helpers dedicated to MediaWiki markup dump parsing here:
https://code.google.com/p/gwtwiki/wiki/MediaWikiDumpSupport
IIRC those helpers were not available when I started the pignlproc tools. Might be useful to investigate directly too.
> Add a Wikinews parser to the wikinews-importer
> ----------------------------------------------
>
> Key: OPENNLP-211
> URL: https://issues.apache.org/jira/browse/OPENNLP-211
> Project: OpenNLP
> Issue Type: Task
> Reporter: Jörn Kottmann
> Attachments: AnnotatingMarkupParser.java, Annotation.java, ParsingWikipediaLoader.java, TestWikipediaParsing.java
>
>
> The current wikinews-importer can only load existing XMI files, that should be fixed by adding a proper wikinews parser wich can turn the wikinews dump into UIMA CASes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (OPENNLP-211) Add a Wikinews parser to the
wikinews-importer
Posted by "Olivier Grisel (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/OPENNLP-211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olivier Grisel updated OPENNLP-211:
-----------------------------------
Attachment: TestWikipediaParsing.java
ParsingWikipediaLoader.java
Annotation.java
AnnotatingMarkupParser.java
Please feel free to reuse and adapt the following classes from the pignlproc project.
The meat is in AnnotatingMarkupParser.java / Annotation.java,
Sample usage (in a pig context, just as a reference) in ParsingWikipediaLoader.java
Some tests fo AnnotatingMarkupParser available in TestWikipediaParsing.java.
The main class is based on the following dependency:
https://code.google.com/p/gwtwiki/ (license under EPL 1.0 hence compatible with the ASF rules if distributed only in binary format, e.g through maven):
<dependency>
<groupId>info.bliki.wiki</groupId>
<artifactId>bliki-core</artifactId>
<version>3.0.16</version>
</dependency>
Note: the gwtwiki project now feature a new API with Helpers dedicated to MediaWiki markup dump parsing here:
https://code.google.com/p/gwtwiki/wiki/MediaWikiDumpSupport
IIRC those helpers were not available when I started the pignlproc tools. Might be useful to investigate directly too.
> Add a Wikinews parser to the wikinews-importer
> ----------------------------------------------
>
> Key: OPENNLP-211
> URL: https://issues.apache.org/jira/browse/OPENNLP-211
> Project: OpenNLP
> Issue Type: Task
> Reporter: Jörn Kottmann
> Attachments: AnnotatingMarkupParser.java, Annotation.java, ParsingWikipediaLoader.java, TestWikipediaParsing.java
>
>
> The current wikinews-importer can only load existing XMI files, that should be fixed by adding a proper wikinews parser wich can turn the wikinews dump into UIMA CASes.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira