You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/04/02 01:14:25 UTC

[jira] [Commented] (NUTCH-2245) Developed the NGram Model on the existing Unigram Cosine Similarity Model

    [ https://issues.apache.org/jira/browse/NUTCH-2245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222510#comment-15222510 ] 

ASF GitHub Bot commented on NUTCH-2245:
---------------------------------------

Github user lewismc commented on a diff in the pull request:

    https://github.com/apache/nutch/pull/101#discussion_r58279977
  
    --- Diff: src/plugin/scoring-similarity/src/java/org/apache/nutch/scoring/similarity/cosine/Model.java ---
    @@ -68,6 +68,11 @@ public static synchronized void createModel(Configuration conf) throws IOExcepti
             }
             LOG.info("Loaded custom stopwords from {}",conf.get("scoring.similarity.stopword.file"));
           }
    +
    +      //Check if user has specified n for ngram cosine model
    +      int ngram = conf.getInt("scoring.similarity.ngrams", 1);
    +      LOG.info("Value of ngram: "+ngram);
    --- End diff --
    
    Please use correct effficient slf4j code notation here
    e.g. LOG.info("Value of ngram: {} ", ngram);


> Developed the NGram Model on the existing Unigram Cosine Similarity Model
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-2245
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2245
>             Project: Nutch
>          Issue Type: New Feature
>          Components: plugin, scoring
>            Reporter: Bhavya Sanghavi
>            Assignee: Sujen Shah
>            Priority: Minor
>              Labels: memex
>
> Built on the existing unigram cosine similarity model by adding the Ngram model, thus providing flexibility to the user to choose the window size for scoring the similarity between webpages and the gold standard.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)