You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Kim Whitehall (JIRA)" <ji...@apache.org> on 2015/09/28 17:12:04 UTC
[jira] [Updated] (NUTCH-2125) Metrics
[ https://issues.apache.org/jira/browse/NUTCH-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kim Whitehall updated NUTCH-2125:
---------------------------------
Description:
Purpose: a metric for determining if the “relevancy” of a crawl after each round and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms will be stored.
- Return the topN terms per a page
- Return the topN terms per a segment based on tf-idf
- Leverage Apache Lucene libs
was:
Purpose: a metric for determining if the “relevancy” of a crawl after each round and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms will be stored.
- Return the topN terms per a page
- Return the topN terms per a segment based on td-idf
- Leverage Apache Lucene libs
> Metrics
> -------
>
> Key: NUTCH-2125
> URL: https://issues.apache.org/jira/browse/NUTCH-2125
> Project: Nutch
> Issue Type: Improvement
> Components: tool
> Affects Versions: 1.10
> Reporter: Kim Whitehall
> Labels: memex
>
> Purpose: a metric for determining if the “relevancy” of a crawl after each round and the “relevancy” of a page. NB: this is not a scoring plugin. By default, the first 25 terms will be stored.
> - Return the topN terms per a page
> - Return the topN terms per a segment based on tf-idf
> - Leverage Apache Lucene libs
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)