You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "İlhami KALKAN (JIRA)" <ji...@apache.org> on 2013/10/21 15:57:42 UTC

[jira] [Comment Edited] (NUTCH-1648) Sentence Detection plugin

    [ https://issues.apache.org/jira/browse/NUTCH-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13800636#comment-13800636 ] 

İlhami KALKAN edited comment on NUTCH-1648 at 10/21/13 1:56 PM:
----------------------------------------------------------------

I added a patch file for sentence detection with OpenNLP of webpage 
content which was crawled. During the parse phase, this plugin detects 
sentence boundary of webpage content and mark with '**|**'. Since creating 
patch file, .bin files are not added, I added this files as models.zip. You 
should unzip models.zip and copy all files under the 
../src/plugin/sentence-detection-opennlp/models/ directory.


was (Author: ilhamikalkan):
I added a patch file for sentence detection with OpenNLP of webpage 
content which was crawled. During the parse phase, this plugin detects 
sentence boundary of webpage content and mark with '*|*'. Since creating 
patch file, .bin files are not added, I added this files as models.zip. You 
should unzip models.zip and copy all files under the 
../src/plugin/sentence-detection-opennlp/models/ directory.

> Sentence Detection plugin
> -------------------------
>
>                 Key: NUTCH-1648
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1648
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 2.2.1
>            Reporter: İlhami KALKAN
>            Priority: Minor
>         Attachments: models.zip, sent-detect-opennlp.patch
>
>
> In parse progress, we need a plugin which detects sentence boundary of page content. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)