You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Kim Whitehall (JIRA)" <ji...@apache.org> on 2015/09/28 17:12:04 UTC

[jira] [Updated] (NUTCH-2125) Metrics

     [ https://issues.apache.org/jira/browse/NUTCH-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kim Whitehall updated NUTCH-2125:
---------------------------------
    Description: 
Purpose: a metric for determining if the “relevancy” of a crawl after each round and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms will be stored. 

- Return the topN terms per a page 

- Return the topN terms per a segment  based on tf-idf

- Leverage Apache Lucene libs

  was:
Purpose: a metric for determining if the “relevancy” of a crawl after each round and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms will be stored. 

- Return the topN terms per a page 

- Return the topN terms per a segment  based on td-idf

- Leverage Apache Lucene libs


> Metrics
> -------
>
>                 Key: NUTCH-2125
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2125
>             Project: Nutch
>          Issue Type: Improvement
>          Components: tool
>    Affects Versions: 1.10
>            Reporter: Kim Whitehall
>              Labels: memex
>
> Purpose: a metric for determining if the “relevancy” of a crawl after each round and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms will be stored. 
> - Return the topN terms per a page 
> - Return the topN terms per a segment  based on tf-idf
> - Leverage Apache Lucene libs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)