You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by Christopher Kolstad <ch...@ovitas.no> on 2007/05/02 11:30:03 UTC

Unexpected result from Lucene.Net index when searching for specific term

Hi.

I've been using Lucene.Net since the 1.4 version, recently moved to
the 2.0004 build and seems to work rather well.

After a rebuild of the code a search for para* yields 453 hits, 9 of which
are excluded by some later logic matching URIs from the index to guarantee
uniqueness of results. If I then immediately after fire another search for
para* it yields exactly 1 hit. This also happens if I've done another search
against the index and then try para* . At first I was sure it was my own
exclusion code that was removing too much, but after lots of logging and
debugging it turns out the Lucene Hits object only contains 1 hit after the
crucial IndexSearcher.search(query, ....) call. I'm at a loss as to why this
happens. Can't seem to recreate the issue with any other search. But I can
recreate the issue when moving the code from one server to another.

Here's the relevant code (queryField gets set from a different method
depending on which index we're searching against):

StandardAnalyzer analyzer = new StandardAnalyzer();

QueryParser qp = new QueryParser(queryField, analyzer);
            Query query;
            int totalHits = 0;//Total hits
            StringBuilder sb = new StringBuilder();
            int docHits = 0;
             Boolean scoreFound = false
                try
                {
                  //If there's a space in our search we want to enclose it
with  ""
                   if (search.Contains(" "))
                    {
                        sb.Append("\"");
                        sb.Append(search);
                        sb.Append("\"");
                    }
                    else
                    {
                        sb.Append(search);
                    }
                    if (log.IsDebugEnabled)
                    {
                        log.Debug("Search: " + sb.ToString()); //Returns
para*
                    }
                    query = qp.Parse(sb.ToString());
                    if (log.IsDebugEnabled)
                    {
                        log.Debug("Search made against index: " +
query.ToString()); //Returns contents:para*
                    }
                    string weightToSetBoost = tRow.weight;
                    double weightOfToken = double.Parse(weightToSetBoost,
allowDecimal);

                    query.SetBoost((float) weightOfToken);

                    Hits hits = isearch.Search(query); //isearch is the
IndexSearcher
                    if (log.IsDebugEnabled)
                    {
                        log.Debug("COUNTING HITS: " + hits.Length());
//Returns 454 the first time, then 1 any other time for the query para*
                    }
                    TreatHits(searchResult, maxHits, hits, originalToken,
indexShort);
//The TreatHits() method is where I thought the error was occurring,
//but since the hit count is 1 as early as the Lucene Hits object, I'm
stumped


I've also tried this with the 003 build and the same happens then.

Let me know if any more information/code is needed.
Anyone run into something similar and willing to share their
experience/solution?

Christopher


--


Regards,
Christopher Kolstad
E-mail: chriswk@ifi.uio.no (University)
christopher.kolstad@gmail.com (Home)
chriswk@ovitas.no (Job)

Re: Unexpected result from Lucene.Net index when searching for specific term

Posted by Christopher Kolstad <ch...@ovitas.no>.
Been testing a bit more. The Lucene index I'm working against is created
using java Lucene 2.0 and not Lucene.Net, I've also written a small test
class to see if the error was reproduced outside my application, and it was,
it only returns a single hit from the index (That is, the hit.Length()
method says 1). I've tested the same index with Luke 0.7 and Luke returns
453 hits.

Here's my test code:

protected void Page_Load(object sender, EventArgs e)
    {
        IndexSearcher isearch = new IndexSearcher("c:\\work\\indices\\fk");
        if (Request["search"] != null && !Request["search"].Equals(""))
        {
            string query = Request["search"];
            QueryParser qp = new QueryParser("contents", new
StandardAnalyzer());
            Query qObject = qp.Parse(query);
            Hits hits = isearch.Search(qObject);
            for(int i = 0; i<hits.Length(); i++)
            {
                Document doc = hits.Doc(i);
                Response.Write("Hit id: " +doc.Get("ID"));
                Response.Write("<br/>Title: " + doc.Get("title"));

                Response.Write("<br/>Score: " +hits.Score(i));
            }
        }

    }

Any idea why Luke returns the correct number of hits while my code only
returns 1 hit?

Appreciate any help.


>
> After a rebuild of the code a search for para* yields 453 hits, 9 of which
> are excluded by some later logic matching URIs from the index to guarantee
> uniqueness of results. If I then immediately after fire another search for
> para* it yields exactly 1 hit. This also happens if I've done another search
> against the index and then try para* . At first I was sure it was my own
> exclusion code that was removing too much, but after lots of logging and
> debugging it turns out the Lucene Hits object only contains 1 hit after the
> crucial IndexSearcher.search(query, ....) call. I'm at a loss as to why
> this happens. Can't seem to recreate the issue with any other search. But I
> can recreate the issue when moving the code from one server to another.
>
> Here's the relevant code (queryField gets set from a different method
> depending on which index we're searching against):
>
> StandardAnalyzer analyzer = new StandardAnalyzer();
>
> QueryParser qp = new QueryParser(queryField, analyzer);
>             Query query;
>             int totalHits = 0;//Total hits
>             StringBuilder sb = new StringBuilder();
>             int docHits = 0;
>              Boolean scoreFound = false
>                 try
>                 {
>                   //If there's a space in our search we want to enclose it
> with  ""
>                    if (search.Contains(" "))
>                     {
>                         sb.Append("\"");
>                         sb.Append(search);
>                         sb.Append ("\"");
>                     }
>                     else
>                     {
>                         sb.Append(search);
>                     }
>                     if (log.IsDebugEnabled)
>                     {
>                         log.Debug("Search: " + sb.ToString()); //Returns
> para*
>                     }
>                     query = qp.Parse(sb.ToString());
>                     if ( log.IsDebugEnabled)
>                     {
>                         log.Debug("Search made against index: " +
> query.ToString()); //Returns contents:para*
>                     }
>                     string weightToSetBoost = tRow.weight;
>                     double weightOfToken = double.Parse(weightToSetBoost,
> allowDecimal);
>
>                     query.SetBoost((float) weightOfToken);
>
>                     Hits hits = isearch.Search(query); //isearch is the
> IndexSearcher
>                     if (log.IsDebugEnabled)
>                     {
>                         log.Debug("COUNTING HITS: " + hits.Length());
> //Returns 454 the first time, then 1 any other time for the query para*
>                     }
>                     TreatHits(searchResult, maxHits, hits, originalToken,
> indexShort);
> //The TreatHits() method is where I thought the error was occurring,
> //but since the hit count is 1 as early as the Lucene Hits object, I'm
> stumped
>
>
> I've also tried this with the 003 build and the same happens then.
>
> Let me know if any more information/code is needed.
> Anyone run into something similar and willing to share their
> experience/solution?
>
>
>


-- 
Regards,
Christopher Kolstad
E-mail: chriswk@ifi.uio.no (University)
christopher.kolstad@gmail.com (Home)
chriswk@ovitas.no (Job)