You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/21 15:57:42 UTC
[jira] [Comment Edited] (NUTCH-1648) Sentence Detection plugin
[ https://issues.apache.org/jira/browse/NUTCH-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800636#comment-13800636 ]
İlhami KALKAN edited comment on NUTCH-1648 at 10/21/13 1:56 PM:
----------------------------------------------------------------
I added a patch file for sentence detection with OpenNLP of webpage
content which was crawled. During the parse phase, this plugin detects
sentence boundary of webpage content and mark with '**|**'. Since creating
patch file, .bin files are not added, I added this files as models.zip. You
should unzip models.zip and copy all files under the
../src/plugin/sentence-detection-opennlp/models/ directory.
was (Author: ilhamikalkan):
I added a patch file for sentence detection with OpenNLP of webpage
content which was crawled. During the parse phase, this plugin detects
sentence boundary of webpage content and mark with '*|*'. Since creating
patch file, .bin files are not added, I added this files as models.zip. You
should unzip models.zip and copy all files under the
../src/plugin/sentence-detection-opennlp/models/ directory.
> Sentence Detection plugin
> -------------------------
>
> Key: NUTCH-1648
> URL: https://issues.apache.org/jira/browse/NUTCH-1648
> Project: Nutch
> Issue Type: Improvement
> Affects Versions: 2.2.1
> Reporter: İlhami KALKAN
> Priority: Minor
> Attachments: models.zip, sent-detect-opennlp.patch
>
>
> In parse progress, we need a plugin which detects sentence boundary of page content.
--
This message was sent by Atlassian JIRA
(v6.1#6144)