You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@nutch.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2005/04/13 19:24:20 UTC

[jira] Closed: (NUTCH-5) Hit limiter off-by-one bug

     [ http://issues.apache.org/jira/browse/NUTCH-5?page=history ]
     
Doug Cutting closed NUTCH-5:
----------------------------


> Hit limiter off-by-one bug
> --------------------------
>
>          Key: NUTCH-5
>          URL: http://issues.apache.org/jira/browse/NUTCH-5
>      Project: Nutch
>         Type: Bug
>   Components: searcher
>     Reporter: Andy Liu
>     Priority: Minor
>  Attachments: fix-hitlimiting.patch
>
> When re-searching for more raw hits, the first result of the next site is skipped.
> From NutchBean.java
> *snip*
> // get the next raw hit
>             if (rawHitNum >= hits.getLength()) {
>                 // optimize query by prohibiting more matches on some excluded sites
>                 Query optQuery = (Query) query.clone();
>                 for (int i = 0; i < excludedSites.size(); i++) {
>                     if (i == MAX_PROHIBITED_TERMS) {
>                         break;
>                     }
>                     optQuery.addProhibitedTerm(((String) excludedSites.get(i)),
>                         IndexSearcher.HIT_LIMIT_FIELD);
>                 }
>                 numHitsRaw = (int) (numHitsRaw * RAW_HITS_FACTOR);
>                 LOG.info("re-searching for " + numHitsRaw +
>                     " raw hits, query: " + optQuery);
>                 hits = searcher.search(optQuery, numHitsRaw);
>                 LOG.info("found " + hits.getTotal() + " raw hits");
>                 rawHitNum = 0;
>                 continue;
>             }
> *snip*
> rawHitNum is reset to 0, but the for loop increments it by one and skips the next result.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
If you want more information on JIRA, or have a bug to report see:
   http://www.atlassian.com/software/jira