You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Anurag <an...@gmail.com> on 2011/01/15 14:42:32 UTC

Implementing Fuzzy Search using OWA operator and Fuzzy Linguistic Quantifier

I am solr-1.3.0 user , where we have integrated Nutch. I want my query types
to be like
eg. at least "some query", or
     most of "some query"

This is to be done through Fuzzy Search techniques. I know there is fuzzy
search in solr using ~(tilde ) sign like
e.g."somequery"~0.8
But we want query to be of above types because its convenient to user to ask
for what he/she really mean. It happens that using AND, OR , NOT for a user
is difficult as user doesnot know where to use AND and where to use OR in
the terms. The logical operators become difficult to use in search query.

Any one have any idea on how to proceed to implement this?

Thanks 

-----
Kumar Anurag

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Implementing-Fuzzy-Search-using-OWA-operator-and-Fuzzy-Linguistic-Quantifier-tp2261469p2261469.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing Fuzzy Search using OWA operator and Fuzzy Linguistic Quantifier

Posted by Anurag <an...@gmail.com>.

Please reply...I need help from you all...........

-----
Kumar Anurag

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Implementing-Fuzzy-Search-using-OWA-operator-and-Fuzzy-Linguistic-Quantifier-tp2261469p2262917.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing Fuzzy Search using OWA operator and Fuzzy Linguistic Quantifier

Posted by Anurag <an...@gmail.com>.

I have some sample code to implement it written using Lucene. This code is
not final and need many modification. Now i want to embed with solr. How
this is possible.

the code is below
//package lia.searching;
import java.util.Arrays;
import java.util.Collections;
//import org.apache.lucene.analysis.standard.StandardAnalyzer;

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Scorer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.Explanation;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.store.FSDirectory;

public class Explainer {

    public static void reverse(float[] array) {
      if (array == null) {
          return;
      }
      int i = 0;
      int j = array.length - 1;
      float tmp;
      while (j > i) {
          tmp = array[j];
          array[j] = array[i];
          array[i] = tmp;
          j--;
          i++;
      }
  }

    public static float fun(float r ,float a ,float b )
    {
        if(r&lt;a) return 0.0f;
        else if(a&lt;=r&amp;&amp;r&lt;=b) return (r-a)/(b-a);
        else if(r&gt;b) return 1.0f;
        return 0.0f; 
   }
      
  public static void main(String[] args) throws Exception {
   
   if (args.length < 3) {
      System.err.println("Usage: Explainer  ");
      System.exit(1);
    }
  
   
    String indexDir = args[0];
    String options = args[1]; //atleasthalf,most, asmanyaspossible
    String[] queryExpression=new String[args.length-2];
    for(int i=2;i&lt;args.length;i++)
    {
      String tmp=args[i];
      queryExpression[i-2]=tmp;
      System.out.println(queryExpression[i-2]);
    }
    


    FSDirectory directory =
        FSDirectory.getDirectory(indexDir, false);
 
     int TOTAL_DOC=12483;//total document=12484

     float[][] R=new float[TOTAL_DOC][10]; //Relevancy Matrix
     float[][] QR= new float[TOTAL_DOC][10]; //Query Relevancy matrix
     float[] Weight=new float[10]; //weigthts for terms like &quot;at least
half of&quot; or &quot;most&quot;
                        //calcualted from formulae Q(r)=Q(i/m)-Q((i-1)/m;
Q(0)=0;

     //calculating weights for the terms
    float a=0.0f,b=0.5f;
    if(options.equals(&quot;atleasthalf&quot;)){a=0.0f;b=0.5f;}
    else if(options.equals(&quot;most&quot;)){a=0.3f;b=0.8f;}
    else
if(options.equals(&quot;asmanyaspossible&quot;)){a=0.5f;b=1.0f;System.out.println(&quot;3rd&quot;);} 

       
    int m=args.length-2;
    for(int i=2,j=1;i&lt;args.length;i++,j++)
     {
       float f1=(float)j/m;
       float f2=(float)(j-1)/m;
         
       Weight[i-2]=fun(f1,a,b)-fun(f2,a,b);
       System.out.print(Weight[i-2]+&quot; &quot;);
     }
     
      System.out.println();
     for(int start=0;start&lt;queryExpression.length;start++)
     {
     
///////////////////////////////////////////////////////////////////////////////////
    QueryParser queryParser = new QueryParser(&quot;content&quot;, new
SimpleAnalyzer());//added
     //QueryParser queryParser = new QueryParser(&quot;content&quot;, new
StandardAnalyzer());
     
    Query query =
queryParser.parse(queryExpression[start]);//,&quot;contents&quot;,new
StandardAnalyzer()  );//old syntax - not work

//////////////////////////////////////////////////////////////////////////////////


    //System.out.println(&quot;Query: &quot; + queryExpression);

    IndexSearcher searcher = new IndexSearcher(directory);
    Hits hits = searcher.search(query);

    System.out.println(&quot;total hits=&quot;+hits.length());

     
    /*//////////////////////////////////////////
    Similarity sim = searcher.getSimilarity();
    /*//////////////////////////////////////

    

    for (int i = 0; i &lt; TOTAL_DOC; i++) {
     // Explanation explanation =
                              //searcher.explain(query, hits.id(i));

      //System.out.println(&quot;----------&quot;);
      try{
        Document doc = hits.doc(i);
        System.out.println(doc.get(&quot;title&quot;));
        }catch(Exception e){}
      try{
      R[i][start]=hits.score(i);
      System.out.println(R[i][start]);
      }
      catch(Exception e)
      {
        R[i][start]=0.0f;
      }
      ////////////////////////////////////
      //System.out.println(hits.score(i));//working
      ////////////////////////////////////



      /*////////////////////////////////////
      int docId = hits.id(i);
          int freq = doc.freq();

      TermFreqVector vector = knownSearcher.reader.getTermFreqVector(doc,
&quot;field&quot;);
          float tf = sim.tf(freq);

         float idf = sim.idf(term, knownSearcher);

       /*//////////////////////////////////////
      


      /* 
      String tmp=explanation.toString();
     
      String tmp2[]=tmp.split(&quot; &quot;);
      for(int l=0;l&lt;tmp2.length;l++)
      {
         float t;
         try{
             t=Float.parseFloat(tmp2[l]);
             System.out.println(t);
            }
            catch(Exception e){}

       }//for

       */
    }//for

    }//for

   



  //sort the relevancy matrix and multiply with weights
  
   for(int i=0;i&lt;TOTAL_DOC;i++)
   {
    
      Arrays.sort(R[i]);
      reverse(R[i]);

       for(int j=0;j&lt;1;j++)
       {
           QR[i][j]=0.0f;
       
          for(int k=0;k&lt;queryExpression.length;k++)
          {
       
               QR[i][j]+=(R[i][k]*Weight[k]);
               
          }
       }
    }

    
  //print the scores of the final documents
  System.out.println(&quot;Final Scores of Documents&quot;);
 
  IndexReader reader = IndexReader.open(directory);
 
  float max=0.0f,min=0.9999f;
  int num1=0,num2=0;
  String ds=&quot;&quot;,ds2=&quot;&quot;;
 for(int i=0;i&lt;TOTAL_DOC;i++)
  {
     try{
     Document d = reader.document( i);
     
     System.out.println(&quot;Document
&quot;+d.get(&quot;title&quot;).toString()+&quot;score= &quot;+QR[i][0]);
     if(QR[i][0]&gt;max) {num1=i;max=QR[i][0];ds=d.get("title").toString();}
     if(QR[i][0]&lt;min&amp;&amp;QR[i][0]&gt;=0.1f)
{num2=i;min=QR[i][0];ds2=d.get("title").toString();}
     }catch(Exception e){}
     //Thread.sleep(100);
    System.out.println(i);
 }

  System.out.println(num1+" "+ds+" "+max);
  System.out.println(num2+" "+ds2+" "+min);
  }//main
}//class

-----
Kumar Anurag

--
View this message in context: http://lucene.472066.n3.nabble.com/Implementing-Fuzzy-Search-using-OWA-operator-and-Fuzzy-Linguistic-Quantifier-tp2261469p2699065.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing Fuzzy Search using OWA operator and Fuzzy Linguistic Quantifier

Posted by Anurag <an...@gmail.com>.

The query will be like
1. " at least (lucene) "
2. " mostly (solr) "
3. Q(query)="most"(t1,t2,t3,t7)       where t1,t2,t3,t7 are terms
4. Q=(t1,0.9) and {(t2,0.5) or (t3,0.7)}

Actually the purpose to expand the query types enterd by user.


On Sun, Jan 16, 2011 at 12:55 AM, iorixxx [via Lucene] <
ml-node+2263070-1851340758-146354@n3.nabble.com<ml...@n3.nabble.com>
> wrote:

> > want my query types
> > to be like
> > eg. at least "some query", or
> >      most of "some query"
>
> Can you elaborate more? It is not so visible what you want.
>
>
>
>
>
>
> ------------------------------
>  View message @
> http://lucene.472066.n3.nabble.com/Implementing-Fuzzy-Search-using-OWA-operator-and-Fuzzy-Linguistic-Quantifier-tp2261469p2263070.html
> To unsubscribe from Implementing Fuzzy Search using OWA operator and Fuzzy
> Linguistic Quantifier, click here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=2261469&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXwyMjYxNDY5fC0yMDk4MzQ0MTk2>.
>
>



-- 
Kumar Anurag


-----
Kumar Anurag

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Implementing-Fuzzy-Search-using-OWA-operator-and-Fuzzy-Linguistic-Quantifier-tp2261469p2263161.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Implementing Fuzzy Search using OWA operator and Fuzzy Linguistic Quantifier

Posted by Ahmet Arslan <io...@yahoo.com>.

> want my query types
> to be like
> eg. at least "some query", or
>      most of "some query"

Can you elaborate more? It is not so visible what you want.